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SEMIOTICS:  THE  TOOLBOX  OF  INTELLIGENCE 


Editor's  Preface 

Intelligent  Systems 

Intelligence  is  magic. 

We  do  not  call  things  that  we  easily  understand  intelligence.  Then,  it  is  considered  just  an 
algorithm  and  as  any  algorithm,  it  seems  to  be  mundane  and  mechanical. 
Yet,  magic  accomplishes  the  goals  in  a  presumably  non-algorithmic  way. 

(I  mean,  magic  not  sorcery  because  although  they  both  rely  on  incantations,  i.e.  use  some  kind  of 
linguistic  tools,  sorcery  employs  evil  forces,  while  our  focus  will  be  on  useful  applications  like 
robotics,  intelligent  manufacturing,  language  translation,  automated  discovery,  unmanned 
energy  systems,  large  health  care  programs,  analysis  of  music  and  poetry,  well. ..battlefield  too.) 

This  is  why  Sukhan  Lee,  one  of  the  organizers  of  the  Symposium  on  Computational  Intelligence 
(Monterey,  July  9-11,  1997),  was  surprised  by  the  discovery  that  the  conference  could  actually 
be  called  "IEEE  International  Symposium  on  Computational  Magic."  The  panel,  at  which  it  was 
discussed,  noticed  that  the  similarity  between  magic  and  intelligence  is  in  the  depth  and  clarity  of 
our  knowledge  of  them. 

This  reminds  us  of  the  commonality  between  intelligence  and  quantum  mechanics.  Many  people 
are  looking  forward  to  uncovering  the  mystery  of  intelligence  by  the  means  of  quantum 
mechanics.  This  commonality  is  in  the  similarity  among  them  in  both  the  depth  and  the  clarity  of 
our  knowledge  of  them. 

We  cannot  avoid  using  the  term  intelligence.  We  can  even  see  some  benefits  in  attributing  the 
term,  intelligent  system,  to  many  not  clearly  defined  real  world  systems:  the  ones  that 
demonstrate  some  magic  that  is  typical  for  intelligence.  We  are  dreaming  of  intelligent  robots  that 
are  capable  of  getting  to  the  core  of  an  unexpected  situation.  Say,  a  particular  situation  is  not 
presented  in  the  list  of  behavior  rules.  Yet,  the  robot  can  figure  out  how  to  deal  with  it.  That  is 
intelligence! 

When  the  famous  Deep  Blue  browses  the  list  of  precomputed  possibilities  with  breathtaking 
speed,  we  are  disappointed.  We  would  prefer  a  slower  thinker  to  come  up  with  a  winning  solution 
with  no  browsing  at  all— possibly  by  magic. 
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Michael  Botvinnik,  the  Chess  Champion  of  the  Post-WWII  era,  envisioned  a  multiresolutional 
system  of  planning  in  future  chess-playing  computers.  Multi-level  decision  making  was  his 
interpretation  for  his  own  mechanisms  of  winning. 

Intelligent  System  is  a  metaphor  for  the  system  which  functions  but  yet  cannot  be  fully 
understood.  To  understand  it,  we  have  to  use  another  intelligent  system  — ourselves — which  we 
cannot  fully  understand  either.  The  metaphor  of  intelligent  system  allows  for  prediction  and 
control  of  the  systems  of  interest,  although  the  properties  and  the  mechanisms  of  these  systems 
are  not  totally  clear  to  us.  We  are  compelled  to  use  this  metaphor  and  this  system,  although  its 
algorithm  cannot  be  clear  to  us  now  (and  maybe  will  never  be  clear  at  all.) 

We  are  interested  in  scientific  and  practical  benefits  in  using  this  metaphor  and  its  theories.  The 
metaphor  of  intelligent  system  maybe  is  applicable  in  all  domains  and  at  all  levels  of  granularity,  or 
resolution.  We  have  to  show  this  in  the  future.  If  we  succeed,  then  the  embedded 
incompleteness  of  our  theory  will  be  forgiven  by  the  scientific  community,  (especially  because  in 
the  case  of  intelligent  systems  this  incompleteness  is  forever!) 

Can  we  say  something  substantial  about  intelligent  systems  today? 

We  can  say  that  intelligent  systems  are  generalizing  machines.  Their  way  of  surviving  is  to 
establish  a  trade-off  between  unbearable  multiplicity  of  needs  and  desires  and  an  unbearable 
amount  of  information  processing,  which  is  required  for  this.  They  generalize  by  the  virtue  of 
grouping  (clustering),  focusing  attention,  and  combinatorial  searching. 

In  the  meantime,  the  procedure  of  generalization  is  always  associated  with  a  loss  of  information. 
This  is  why  the  bottom-up  process  of  re-generalization  should  be  repeated  constantly.  Sooner  or 
later,  the  entities  of  the  upper  level  would  change  inadvertently.  When  new  information  is 
incorporated,  the  procedure  of  regrouping,  refocusing,  and  re-searching  will  be  performed. 
Thus,  the  previously  created  multiresolutional  system  of  representation  will  be  changed  as  a 
partial  effect  of  such  a  learning  process. 

(See?  Would  it  be  sufficient  to  look  for  the  explanation  for  intelligence  only  within  the  quantum 
physics?  Quantum  mechanics  itself  is  a  product  of  the  multiresolutional  system  of  intelligence.) 

Semiotics 

Semiotics  is  the  only  system  of  thought  that  dares  to  consider  itself  as  an  object  of  its  own 
analysis.  Like  intelligence,  semiotics  is  a  self-referential  and  self-reflective  system.  All 
components  of  intelligence  including  generalization  with  its  focusing  attention,  search  and 
clustering  are  parts  of  semiotics.  Actually,  what  we  called  learning  process,  in  semiotics  is  called 
semiosis. 
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Semioticians  are  concerned  with  the  issue  "where  is  our  niche?"  This  is  an  important  issue,  but 
you  have  to  specify  what  level  of  resolution  you  are  talking  about.  Indeed,  you  may  look  for  a 
niche  in  one  room  versus  another,  in  one  country  or  state  versus  another,  or  even  in  one  galaxy 
versus  another. 

When  I  talk  about  this  search  for  a  niche,  /imagine  crowds  of  scientists  intruding  the  Wild  West  of 
Intelligence  looking  for  a  place  to  prosper,  or  the  crowd  of  scientists  like  gold  miners  trying  to 
stake  a  place  in  the  Klondike  of  Intelligence.  Remember  that  everything  happened  literally  on  our 
eyes.  Indeed,  Ampere's  "Kybernetes"  and  Maxwell's  "On  Governors"  appeared  in  in  the  middle 
of  19th  Century.  At  about  the  same  time,  C.  Peirce  wrote  his  papers  on  semiotics.  Wiener's 
"Cybernetics"  was  published  a  century  later.  We  did  not  even  have  enough  time  to  properly 
digest  these  works. 

The  invariance  of  results  in  different  scientific  domains  is  not  in  looking  for  answers  to  the 
question  "What?"  but  rather  in  looking  for  answers  to  the  question  "How?" 
Semiotics  is  the  one  which  is  looking  for  "How?"  for  all  domains  together  because  our  intelligence 
is  common  for  all  domains. 

The  well-known  saying  is  that  mathematics  is  the  language  of  Science. 
Yes,  but  it  uses  Semiotics  as  its  Toolbox,  the  Metalanguage. 

Semiotics  supplies  its  Metalanguage  to  all  disciplines  of  Science  that  are  therefore  dialects  of 
Semiotics. 

Semiotics  is  mathematics  that  aspires  for  symbol  grounding. 

Semiotics  is  mathematics  that  does  not  delegate  symbol-grounding  to  engineers. 

Semiotics  is  mathematics  that  remembers  that  all  axiomatic  theories  are  just  alternatives  of  World 

simulation  or  for  World  simulation.  It  teaches  engineers  how  to  perform  symbol  grounding  after  a 

simulation  is  done. 

Let  us  ask  the  question,  "What  is  Life,  a  playground,  or  a  place  with  situated  meaning?"  Semiotics 
does  not  follow  the  temptation  to  answer  "a  little  bit  of  both. "  Semiotics  is  looking  for,  interested 
in,  and  striving  toward  a  situated  meaning  of  Life. 

For  semiotics,  the  existentialist  issue  "I  think,  therefore  I  am"  does  not  exist.  It  is  merely  a 
philosophical  question.  For  semiotics,  the  insight  "I  think,  therefore  I  axiomatize"  has  only  a  partial 
importance.  This  is  a  mathematical  issue.  For  semiotics,  the  issue  is,  "I  think,  therefore,  I  must 
understand  how  do  I  do  it." 
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Yes,  mathematics  is  a  dialect  of  semiotics.  But,  semiotics  always  attacks  the  problem  before 
mathematics  does.  Mathematics  can  be  applied  only  after  we  understand  the  system  well  or  can 
pretend  to  understand  it  well. 

Conference  IS  AS '97 

Jourdain,  the  protagonist  of  Moliere's play  "Le  Bourgeois  Gentilhomme, "  was  surprised  when  he 
found  out  that  he  speaks  prose.  Most  of  the  professionals  in  science  and  other  intellectual  areas 
have  no  idea  that  they  speak  semiotics. 

We  have  a  mission:  to  demonstrate  the  power  of  semiotic  tools  to  them,  to  promulgate 
understanding,  to  diminish  confusion,  to  enhance  clarity. 

Semiotics  is  the  invariance  of  all  disciplines  represented  at  our  conference.  Participants  of  the 
conference  and  readers  of  this  volume  will  easily  recognize  the  relevance  of  each  discipline  both 
to  intelligent  systems  and  to  semiotics. 

Some  of  these  quests  will  be  easy;  some  will  require  an  effort.  The  theory  of  games  is  ultimately 
semiotic.  It  is  self-referential.  Its  implicit  component  is  reflection,  and  it  realizes  that  known  theory 
does  not  guarantee  to  win  the  game.  When  you  think  about  it,  the  theory  of  automata  can  be 
considered  an  attempt  to  decompose  Peircean  triangle  into  its  components. 

I  hope  that  the  toolbox  of  semiotics  will  be  useful  in  your  particular  domain  of  science,  in  its 
particular  dialect.  I  hope  that  it  will  enhance  your  power  of  analysis  and  interpretation,  and  allow 
you  to  express  the  meaning  we  always  search  for. 

I  wish  you  happy  semiotic  analysis  of  multiple  complex  intelligent  systems  you  will  encounter  aft 
this  Conference. 

A.  Meystel 
July  26,  1997 
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I  PLENARY  LECTURES 


Why  Now  Semiotics? 

From  Real-Time  Control 
to  Signs  and  Symbols 


James  S.  Albus 
Chief,  Intelligent  Systems  Division 
Manufacturing  Engineering  Laboratory 
National  Institute  of  Standards  and  Technology 
albus@cme.nist.gov 


Why  is  NIST  hosting  a  conference  on  Semiotics? 
NIST's  primary  interest  is  in  intelligent  control 
systems  for  manufacturing  and  in  standards  and 
measures  for  open  architecture  controllers.  What  has 
that  to  do  with  the  science  (dare  I  say  the  philosophy) 
of  signs  and  symbols?  What  has  semiotics  to  say  to 
manufacturing?  You  can  read  for  a  long  time  in  the 
semiotics  literature  before  you  encounter  a  reference  to 
the  problem  of  data  communications  between  a  CAD 
database  and  a  numerically  controlled  machine  tool.  So 
where  is  the  connection?  What  does  the  engineering  of 
intelligent  real-time  control  systems  have  in  common 
with  the  science  of  signs  and  symbols? 

Control 

To  answer  this  last  question,  let's  take  a  look  at 
the  field  of  control.  The  history  of  real-time  control 
begins  with  the  invention  of  the  fly-ball  governor.  The 
problem  was  how  to  regulate  the  flow  of  steam  to 
control  the  speed  of  a  steam  engine  under  variable  load 
conditions.  The  solution  was  feedback.  If  the  engine 
was  running  too  slowly,  the  fly-ball  arms  fell  down, 
opening  a  valve  and  allowing  more  steam  to  flow.  If 
the  engine  was  running  too  fast,  the  fly-ball  arms  flew 
out,  closing  the  valve  and  slowing  the  flow  of  steam. 

The  development  of  feedback  control  theory 
reached  a  peak  during  World  War  II  with  the 
development  of  radar  controlled  anti-aircraft  guns. 
Norbert  Wiener  was  the  intellectual  leader  of  this  effort. 
He  founded  the  field  of  cybernetics.  Feedback  control 
theory  matured  in  the  aerospace  industry  during  the 
Cold  War  race  for  supremacy  in  the  skies  and  outer 
space.  Today,  feedback  control  theory  provides  the 
mathematical  foundations  for  servo  controllers  used  in 
millions  of  applications. 

Feedforward  control  is  simply  a  recognition  that  if 
you  know  enough  about  the  system  you  are 
controlling,  you  can  design  a  pattern  of  inputs  to  the 
controller  that  will  produce  the  desired  output  from  the 
system. 


One  example  of  feedforward  control  is  the  control 
program  for  a  machine  tool.  The  machine  tool  and  the 
metal  cutting  processes  are  sufficiently  well 
characterized  so  that  low  level  servo  controllers  can 
guarantee  that  the  tool  will  go  to  the  points  along  the 
paths  designated  in  the  program.  A  typical  machine 
tool  program  may  include  thousands  of  steps  which 
proceed  from  start  to  finish  with  essentially  no  feedback 
from  the  machine. 

A  more  interesting  example  of  feedforward  control 
is  the  use  of  plant  models  (and  inverse  plant  models) 
for  process  control.  If  you  want  to  get  a  plant  to 
produce  a  certain  output,  first  develop  a  plant  model 
(transition  matrix  between  inputs,  states,  and  outputs). 
Then,  invert  the  model.  When  you  apply  the  desired 
output  to  the  inverse  plant  model,  it  will  produce  the 
proper  control  sequence  that  if  input  to  the  plant  will 
cause  it  to  generate  the  desired  output.  In  other  words, 
an  inverse  plant  model  put  in  series  with  a  modeled 
plant  produces  a  transfer  function  of  unity.  Thus,  the 
desired  output  applied  to  the  series  combination  of  the 
inverse  model  and  the  plant  itself  will  produce  the 
desired  output  from  the  plant.  It  is  amazing  that  this 
works.  How  well  it  works  depends  on  the  fidelity  of 
the  inverse  plant  model.  Bernie  Widrow  just  published 
a  book  on  how  to  use  neural  nets  to  develop  inverse 
plant  models  even  for  non-minimum  phase  plants. 

In  the  field  of  artificial  intelligence  (AI)  and 
robotics,  a  great  deal  of  research  has  been  done  on 
planning.  A  plan  expressed  as  a  series  of  commands  is 
essentially  a  feedforward  control  sequence.  Thus, 
planning  is  a  form  of  feedforward  control.  A  planner 
uses  a  plant  model  to  predict  the  results  of  hypothesized 
actions.  The  ability  to  predict  is  a  key  (perhaps  THE 
key)  to  intelligent  behavior.  The  development  of 
prediction  theory  was  first  developed  by  Kalman  for  use 
in  designing  recursive  filters.  A  Kalman  filter  predicts 
one  sample  period  into  the  future.  An  AI  planner 
requires  a  model  that  can  predict  the  future  out  to  a 
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planning  horizon  which  may  be  many  seconds, 
minutes,  hours,  or  days  into  the  future. 

Recently,  controller  developers  began  to  combine 
feedback  and  feedforward  control  techniques  to  achieve 
the  advantages  of  both.  AI  system  designers  have  begun 
to  combine  deliberative  planners  (feedforward)  with 
reactive  (feedback)  controllers.  Control  theory  is 
beginning  to  focus  on  prediction  --  to  anticipate  the 
requirements  of  the  future  in  addition  to  compensating 
for  errors  from  the  past. 

Both  feedback  and  feedforward  control  systems 
employ  concepts  such  as  signals,  states,  state  variables, 
events,  and  event  tokens.  Knowledge  of  plants, 
processes,  entities,  attributes,  and  plant  behavior  are 
stored  in  a  variety  of  numerical  and  symbolic  formulae, 
rules,  state-transition  matrices,  and  state  graphs 
consisting  of  characters  and  words,  images,  frames, 
graphs,  and  pointers. 

Control  systems  must  also  deal  with  real-time 
issues  such  as  response  time,  bandwidth,  sampling  rate, 
stability,  observability,  and  controllability.  Control 
theory  has  developed  an  arsenal  of  mathematical  tools 
for  optimizing  designs  and  for  incorporating  adaptive 
and  learning  algorithms  into  the  control  system.  Most 
recently,  attention  has  been  focused  on  fuzzy  systems, 
neural  nets,  and  genetic  algorithms. 

Semiotics 

Semiotics  is  the  science  of  signs  and  symbols. 
Fundamental  to  signs  and  symbols  is  the  notion  of 
representation.  The  dictionary  definition  of  the  word 
"represent"  means  "to  be  a  symbol  for  or  an  image  of 
something  in  the  world."  Representation  requires  data 
structures  that  represent  (and  presumably)  correspond  to 
entities  and  situations  in  the  world. 

Symbol  grounding  refers  to  the  problem  of 
establishing  and  maintaining  correspondence  between 
symbols  in  the  data  structure  and  things  in  the  world. 
In  an  intelligent  system,  signal  grounding  is  achieved 
by  sensory  processing  and  world  modeling  functions 
that  establish  and  maintain  correspondence  between  the 
internal  world  model  and  external  reality. 

It  was  Immanual  Kant  who  first  made  clear  the 
distinction  between  a  representation  that  exists  in  the 
mind  and  objective  reality  that  exists  in  the  world.  We 
need  to  keep  this  distinction  clear  in  the  design  of 
intelligent  semiotic  systems.  Entities  and  situations  in 
the  world  are  sensed  producing  observed  signals  and 


images  that  are  segmented  into  observed  entities. 
Observed  entities  and  images  are  filtered  to  form 
estimated  entities  and  images.  Dynamic  models  can 
then  produce  predicted  entities  and  images.  Predicted 
entities  and  images  plus  desired  goals  can  produce  plans 
and  actions  in  the  world. 

Semiotics  has  a  lot  to  do  with  language, 
particularly  regarding  semantics,  pragmatics,  causality, 
and  plausibility. 

Semantics  defines  the  meaning  of  what  is  encoded 
in  grammatically  correct  strings  of  symbols  and  words. 
One  of  the  most  basic  forms  of  meaning  is  contained  in 
the  logical  calculus  of  deduction,  inference,  and 
probability.  Semantics  can  be  formalized  in  relational 
graphs  consisting  of  networks  of  nodes  and  edges  where 
the  nodes  are  words  and  the  edges  are  relationships. 

Relational  graphs  can  also  be  used  to  describe 
pragmatics  (what  things  are  used  for)  and  causality 
(what  things  cause  other  things).  Rules,  equations, 
formulae,  and  algorithms  can  be  used  to  define  how 
things  work  and  what  effect  some  things  have  on  other 
things.  For  example,  F  =  ma  is  a  formula  that 
describes  the  relationship  between  force  applied  to  a 
mass  and  the  acceleration  that  will  be  experienced  by 
that  mass.  The  formula  F=ma  can  be  used  to  compute 
how  much  force  is  needed  to  produce  a  desired 
acceleration  or  how  much  acceleration  to  expect  when  a 
particular  force  is  applied. 

Situations  can  be  represented  by  words  and 
narrative  text.  However,  situations  can  also  be 
represented  by  images,  drawing,  maps,  and  pictures. 
For  reasons  that  I  do  not  fully  understand,  semiotics 
seems  much  less  concerned  with  images  and  maps  than 
with  words  and  symbols.  Apparently,  a  strong 
preference  exists  among  researchers  in  semiotics,  as 
well  as  in  artificial  intelligence,  and  linguistics  to 
represent  the  world  in  symbolic  form  ~  i.e.,  in  frames, 
relational  graphs,  and  text,  rather  than  in  pictures  and 
maps. 

But  words  are  often  inadequate  to  fully  describe 
the  richness  and  dynamic  vibrancy  of  real  world 
situations.  Especially  if  we  take  into  account  the 
totality  of  the  sensory  experience  ~  vision,  hearing, 
smell,  taste,  touch,  and  gut  feelings  —  words  become  a 
cumbersome  and  sparse  medium  for  representation. 
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Images  as  Representation 

In  this  presentation,  I  want  to  argue  the  case  that 
image  representations  are  much  more  important  than 
anyone  would  guess  from  reading  either  the  semiotic  or 
the  control  theory  literature.  I  would  argue  that  images 
are  not  just  sensory  input  to  be  dispensed  with  as  soon 
as  it  is  possible  to  derive  a  symbolic  representation. 
Images  are  repositories  of  information  about  spatial  and 
temporal  detail  that  cannot  be  represented  as  efficiently 
in  any  other  format. 

It  is  clear  that  the  use  of  images  for  representation 
of  knowledge  and  computing  developed  very  early  in 
the  evolution  of  the  brain.  Visual  competence  existed 
in  the  brains  of  sea  creatures  and  insects  almost  a 
billion  years  ago.  Fish,  crustaceans,  and  insects  have 
eyes  and  neural  computing  structures  for  processing 
images.  Birds  and  mammals  all  represent  the  visual 
world  and  process  images  for  purposes  of  locomotion 
and  manipulation.  Vision  is  crucial  to  hunting  for  food 
and  evading  danger  in  almost  every  species.  There  is 
good  neurophysiological  evidence  from  birds  and 
primates  that  neurons  on  the  surface  of  the  superior 
colliculus  are  anatomically  arranged  so  as  to  form  a 
visual  map  on  which  targets  of  attention  are  displayed. 
Signals  from  this  genetically  designed  map 
representation  are  then  transformed  directly  into 
commands  to  muscles  to  move  the  eyes. 

In  marked  contrast,  language  comes  very  late  in 
the  evolutionary  history  of  the  brain.  Birds,  mammals, 
and  even  insects  use  a  variety  of  signs  and  symbols  to 
signal  their  presence  and  their  sexual  interests.  Many 
animals  use  symbolic  gestures  or  sounds  to 
communicate  intentions  such  as  aggression, 
submissiveness,  friendliness,  or  fear,  or  to  warn  others 
of  danger.  Nevertheless,  in  all  species,  language 
competence  is  primitive  when  compared  to  visual 
processing  capabilities.  Among  humans,  the  level  of 
language  competence  that  most  occupies  researchers 
probably  did  not  exist  in  verbal  or  sign  language  form 
more  than  100,000  ago.  Written  forms  of  language 
first  appeared  only  about  10,000  years  ago. 

It  is  said  that  a  picture  is  worth  a  thousand  words. 
Clearly,  the  capabilities  of  the  brain  for  representing 
and  processing  visual  information  far  exceeds  those  for 
language  representation  and  processing.  From  this  fact 
alone,  I  would  think  that  more  emphasis  should  be 
placed  on  the  use  of  images  as  a  mechanism  for 
thinking  about  and  planning  for  the  future.  We  should 


pay  more  attention  to  computing  with  images,  rather 
than  computing  on  images  to  extract  symbolic  data  - 
and  then  discarding  the  images. 

Images  are  derived  directly  from  sensors.  The 
sensor  egosphere  is  a  very  natural  coordinate  frame  for 
representing  knowledge  about  the  world  --  especially 
knowledge  about  where  things  are  and  how  they  are 
moving.  We  often  communicate  information  through 
pointing  to  things.  We  show  each  other  how  to  do 
things.  We  compute  where  to  place  our  feet  based  on 
where  supporting  surfaces  appear  in  our  visual  world. 
We  visually  track  targets  we  intend  to  pursue  and  look 
directly  at  things  we  intend  to  manipulate.  We  flee 
from  parts  of  the  egosphere  that  contain  threatening 
images. 

Images  that  are  overlaid  with  symbolic 
information  about  objects  are  called  maps.  Maps  can 
be  used  to  represent  situations  and  to  make  plans.  For 
example,  military  maps  of  the  battlefield  typically 
show  the  elevation  and  slope  of  the  terrain,  the  type  of 
ground  cover  (trees,  grass,  buildings,  roads),  the  type  of 
terrain  (lake,  mountain,  forest,  open  fields),  and  the 
disposition  of  friendly  and  enemy  forces.  Soldiers 
frequently  used  maps  and  physical  models  of  the  terrain 
to  plan  battles.  Travelers  rely  on  maps  for  planning 
and  executing  trips.  Navigators  for  ships  and  airplanes 
use  maps  to  figure  out  where  they  are,  and  to  plan  what 
direction  to  go,  and  when  to  turn. 

I  intend  to  pursue  this  issue  of  visual  (or  iconic) 
versus  symbolic  (or  language  based)  representations 
throughout  the  body  of  my  talk.  I  want  to  suggest  that 
the  study  of  signals,  images,  and  symbols,  and  the 
processes  that  transform  one  into  the  other  form  the 
basis  for  integration  of  semiotics  with  real-time 
control. 

Representation  to  Action 

I  will  argue  that  intelligent  systems  can  improve 
its  performance  by  using  both  iconic  and  symbolic 
representations  and  computational  methods.  Iconic  and 
symbolic  formats  can  and  should  be  tightly  linked  so 
that  information  computed  in  one  domain  can  be 
transformed  into  the  other  and  vice  versa.  For  example, 
combining  iconic  representations  with  symbolic 
information  produces  maps  that  can  be  used  to  plan  and 
control  behavior.  Regions  on  maps  can  be  specified  as 
goals.  Task  objects  can  be  specified  symbolically  by 
name  and  identified  visually  in  images.  Goals  can  also 
be  defined  by  means  of  visual  representations  similar  to 
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"key  frames"  used  in  cartoon  movies.  Situations  can 
be  specified  on  maps  showing  the  desired  configuration 
of  objects  relative  to  each  other  and  the  terrain. 
Attention  can  be  specified  in  terms  of  where  on  the 
egosphere  to  look  or  where  to  point  high  resolution 
sensor  arrays. 

I  also  want  to  describe  a  reference  model 
architecture  that  incorporates  both  symbolic  and  iconic 
representations  and  outline  an  engineering  methodology 
for  the  design  and  development  of  intelligent  control 
systems.  I  will  briefly  describe  some  application 
examples  developed  using  this  methodology. 

But,  in  the  end,  I  return  to  the  question  "Why 
Now  Semiotics?"  Why  should  real-time  control 
theorists  care  about  the  science  of  signs  and  symbols? 
And  why  should  semioticists  study  issues  related  to 
real-time  control? 

The  answer,  I  believe,  can  be  found  in  those  areas 
where  the  two  fields  intersect  and  address  common 
problems.  These  areas  include  the  acquisition, 
representation,  and  use  of  knowledge  about  the  world. 
How  can  signals  from  sensors  be  transformed  into 
images,  maps,  and  symbols  that  represent  states, 
attributes,  entities,  events,  values,  and  situations  in  the 
world?  How  can  knowledge  be  used  to  focus  attention, 
plan  action,  and  reason  about  the  best  course  of  action? 
How  does  an  intelligent  system  generate  and  express 
goals  and  intentions?  How  does  it  know  how  to 
behave  in  order  to  achieve  goals  in  an  uncertain  and 
often  hostile  environment?  The  answers  to  these 
questions  will  be  found  sooner  if  we  combine  what  is 
known  and  is  being  discovered  in  the  two  fields  of 
semiotics  and  real-time  intelligent  control. 

I  believe  semiotics  can  help  control  engineers 
better  understand  how  to  represent  images,  maps, 


entities,  actions,  rules,  situations,  and  relationships.  I 
think  recursive  filtering  theory  and  image  processing 
research  can  help  semioticists  better  establish 
correspondence  between  the  real  world  and 
representations  of  the  world.  Given  these  better 
representations,  intelligent  systems  can  better  plan  and 
execute  tasks  amid  uncertainty.  Given  a  reference 
model  architecture,  researchers  can  better  understand 
how  to  partition  the  control  system  so  that  various 
kinds  of  information  can  be  learned.  Learning  can  only 
take  place  within  a  prior  structure  where  incremental 
improvements  in  performance  can  be  rewarded  and 
failures  can  be  incrementally  corrected.  Given  an 
engineering  methodology,  engineers  can  better 
understand  how  to  approach  the  design  of  intelligent 
systems. 

Semiosis 

I  would  argue  that  the  combination  of  real-time 
control  and  semiotics  will  enable  us  to  build  more 
intelligent  systems  for  practical  applications. 
Semiotics  gives  us  a  better  understanding  of  semantics, 
pragmatics,  causality,  logic,  inference,  probability,  and 
plausibility.  Intelligent  real-time  control  theory  gives 
us  a  better  understanding  about  how  sense  and  process 
sensory  data  into  knowledge  about  the  world,  and  how 
to  use  that  knowledge  to  plan  and  control  behavior. 
Combined,  these  two  diverse  fields  provide  insight  as 
to  how  better  to  acquire,  represent,  and  use  knowledge 
and  how  to  transform  knowledge  into  actions  that 
produce  real  wealth  for  the  benefit  of  society. 

This  then  is  the  answer  to  "Why  Now 
Semiotics?"  Now  is  the  time  to  integrate  semiotics 
with  real-time  control  to  produce  intelligent  systems 
for  practical  applications  of  great  importance  to  the 
future  of  human  kind. 
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ABSTRACT 

Rather  than  design  by  arranging  (and  rearranging)  components, 
why  not  use  fast-time  mutation  and  selection  to  evolve  the 
required  design?  Simply  generate  a  random  set  of  alternative 
designs.  Suggestions/hints  can  be  added  at  any  time,  but  are  not 
required,  for  the  process  learns  without  instruction.  Increasingly 
appropriate  designs  are  found  through  successive  generations. 
This  evolutionary  computation  has  been  used  to  detect  unknown 
signals  in  noise  (discovering  new  templates  for  later  pattern 
recognition),  to  design  optimal  neural  networks  (even  when  the 
response  surface  is  discontinuous),  for  forecasting 
nonstationary  time  series  given  an  arbitrary  payoff  function, 
for  determining  the  relationship  among  variables  (modeling  an 
unknown  transducer/plant),  for  the  control  of  that  plant  within 
the  limits  of  its  controllability,  and  for  gaming  if  that  plant  has 
its  own  purpose.  There  are,  indeed,  a  wide  range  of  other 
applications. 

Of  course,  all  this  presumes  the  problem  at  hand  is  well  defined, 
that  is,  that  a  meaningful  fitness/payoff  function  has  been 
specified.  This  requires  more  than  simply  measures  of 
effectiveness  and  related  measures  of  performance.  The  problem 
becomes  well  defined  only  if  these  measures  are  weighted  in 
relative  importance  and  combined,  taking  into  account  their 
degree  of  criticality.  The  Valuated  State  Space  Approach 
provides  a  convenient  way  to  translate  subjective  judgment  with 
respect  to  intent  into  a  concise  specification  of  the  purpose  to 
be  achieved.  Top-down  engineering  can  be  realized  by  coupling 
the  Valuated  State  Space  Approach  with  evolutionary 
programming. 

KEYWORDS: 

top-down  engineering,  evolutionary  programming, 
stochastic  optimization,  evolutionary  computation 

1.  INTRODUCTION 

"en»gi»neer'ing,  n.  the  planning,  designing,  construction, 
or  management  of  machinery,"  [1]  begins  with  the  recognition 
of  a  need.  The  available  resources/components/building  blocks 
are  arranged  within  the  imposed  constraints.  This  construct  is 
evaluated.  The  weak  points/bottlenecks  are  identified. 
Adjustments  are  made  to  improve  the  design.  This  process 
continues  until  further  improvements  are  no  longer  cost 
effective.  The  entire  procedure  is  bottom-up. 


But  a  much  better  design  may  exist  .  .  .  one  that  cannot  be  found 
by  continually  improving  on  the  original  concept.  Searching 
the  streets  of  Los  Angeles  for  the  quickest  route  may  take  you 
away  from  a  freeway  entrance  that  might  provide  much  faster 
transit.  Continually  improving  the  gaslamp  will  never  make  it 
into  an  electric  light.  Why  not  design  top-down,  that  is,  use 
fast-time  computation  to  directly  search  the  domain  of  candidate 
solutions/designs  until  one  of  sufficient  worth  has  been  found? 

At  first  glance  this  might  seem  unreasonable,  for  the  domain  of 
possible  solutions  is  so  immense.  Exhaustive  search  is  clearly 
impossible.  It  becomes  feasible  only  if  an  extremely  efficient 
search  technique  is  available.  Why  not  simulate  the  way  nature 
evolves  increasingly  appropriate  creatures/designs  over 
millennia?  Each  organism  (design)  faces  the  challenge  posed  by 
its  environment  and  is  thus  scored.  Those  that  are  sufficiently 
fit  produce  offspring  that  are  similarly  tested.  This  optimization 
process  is  extremely  efficient  and  can  be  used  to  address  well- 
defined  engineering  problems,  that  is,  to  find  a  good  enough 
solution  fast  enough  for  it  to  be  useful. 

2.  DISCUSSION 

Real-world  problems  are,  indeed,  extremely  complex.  The 
constraints  are  nonlinear  and  the  objective  function  may  change 
as  the  situation  develops.  To  solve  these,  conventional 
engineering  relies  on  simplifying  assumptions.  For  example,  the 
actual  problem  is  often  separated  into  components  that  can  be 
more  easily  addressed.  But  locally  optimal  components  do  not 
integrate  to  yield  a  globally  optimal  solution  unless  the 
components  are  independent  .  .  .  and  they  rarely  are.  We  call  upon 
linear  programming  even  when  the  constraints  are  known  to  be 
nonlinear.  Steepest  descent  is  used  even  when  the  response 
surface  may  have  multiple  modes,  be  discontinuous,  noisy  or,  in 
the  limit,  have  no  gradient.  Spectral  analysis  and  Markov 
processes  are  used  to  predict  time  series,  even  when  the  actual 
environment  is  known  to  be  nonstationary.  Such  simplification 
often  yields  the  right  answer,  to  the  wrong  problem! 

Another  way  to  simplify  is  to  use  expert  systems.  Here,  the 
design  is  based  on  using  generally  useful  rules  (heuristics) 
derived  from  experience.  But  remembered  rules  may  be  in  error. 
The  sample  size  documenting  the  worth  of  individual  heuristics 
is  likely  to  be  insufficient.  This  is  especially  true  when  dealing 
with  a  dynamic  environment  where  the  objective  function  is 
changing  while  the  problem  is  being  solved.  And  rules  that  are 
good  "on  the  average,"  may  not  be  good  enough.  In  practice, 
we  need  an  particularly  appropriate  way  to  solve  the  problem  at 
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hand.  Incidentally,  when  dealing  with  a  nonstationary 
environment,  the  rules  that  are  best  at  any  particular  time  are 
likely  to  be  far  less  than  best  at  other  times.  To  make  matters 
worse,  obeying  any  set  of  rules  is  certainly  not  the  best  way  to 
operate  in  an  environment  that  includes  intelligently 
interactive  competitors. 

Traditional  engineering  is,  indeed,  bottom-up.  The  design 
process  becomes  increasingly  complicated  and  cumbersome. 
Computers  are  used  to  solve  complex  sets  of  differential 
equations  .  .  .  rather  than  to  directly  address  the  problem  at 
hand.  But  we  can  now  generate  alternative  solutions  at 
negligible  cost,  and  thus  simulate  the  fundamental  process  of 
evolution  in  order  to  design  top-down.  Alternative  solutions  are 
scored  as  separate  entireties.  No  direct  attempt  is  made  to 
improve  any  of  these  individual  designs.  Those  found  inadequate 
are  simply  discarded  with  no  penalty.  The  solutions  that  are 
found  to  be  worthy  are  varied  to  yield  offspring,  thereby 
generating  a  new  population  (generation)  that  is  similarly 
scored.  As  in  nature,  the  selection  process  is  phenotypic. 
Alternative  designs/organisms  are  scored  only  in  terms  of  their 
expressed  behavior,  not  their  components  or  structure.  The 
mutation/selection  process  iterates  until  a  sufficiently  worthy 
solution  has  been  found  (or  the  available  computational 
resources  have  been  expended). 

This  fast-time  neo-Darwinian  evolution  has  a  significant 
history.  There  have  been  various  approaches:  evolution 
strategies,  evolutionary  programming,  genetic  algorithms  and 
related  efforts  in  genetic  programming  and  classifier  systems. 
The  first  two  of  these  are  very  similar,  the  primary  distinction 
being  whether  or  not  the  parents  are  part  of  the  next  generation. 
Genetic  algorithms  are  quite  different.  They  often  use  binary 
string  representations,  even  when  this  is  not  an  intuitive 
representation  for  the  problem  at  hand.  In  contrast,  in 
evolutionary  programming,  the  representation  follows  the 
designer's  intuition.  Any  suitable  choice  can  be  made.  Genetic 
algorithms  strongly  rely  on  crossing  over  to  generate  new 
offspring,  cutting  and  splicing  existing  solutions  in  the  hopes 
that  such  recombination  will  be  useful.  In  evolutionary 
programming,  no  emphasis  is  placed  on  any  particular  operator: 
again  the  mode  of  variation  can  reflect  the  representation 
chosen  as  well  as  the  known  interactivities  between 
components  of  the  solutions.  It  may  be  wise  to  recombine  ideas 
from  different  solutions  (for  example,  mix  ice  cream  and  root 
beer),  or  it  may  not  (for  example,  combining  the  first  half  of 
Hamlet  with  the  second  half  of  Macbeth  ...  it  is  still 
Shakespear,  but  it  doesn't  make  sense).  Recent  mathematical 
proofs  have  indicated  that  crossover  provides  no  general 
advantage  to  solving  problems  [2],  and  so,  in  order  to  maximize 
the  practical  use  of  evolutionary  algorithms,  the  search 
operators  must  be  tailored  to  the  task  at  hand.  This  requires 
expertise. 

Finally,  with  regard  to  the  selection  mechanisms,  the 
traditional  genetic  algorithm  relies  on  proportional  selection, 
with  the  expected  number  of  copies  of  each  parent  being  made  in 
proportion  to  its  relative  fitness.  In  contrast,  in  evolutionary 


programming,  selection  simply  eliminates  unfit  solutions  and 
can  also  be  made  probabilistic  such  that  there  is  a  tunable 
likelihood  of  accepting  solutions  with  lower  fitness.  This 
ability  to  adjust  the  selection  pressure  offers  yet  another 
versatility  to  evolutionary  programming.  Note  that  it  naturally 
lends  itself  to  parallel  processing  and  can  be  operated  in  a 
hierarchic  manner,  that  is,  with  a  higher  level  of  evolution 
adapting  the  variables  of  the  lower  level.  This  may  involve 
adjusting  the  population  size,  number  of  generations  before 
each  decision,  the  selection  pressure,  the  mutation  noise 
distribution,  and  so  forth. 

Indeed,  evolutionary  programming  [3],  [4],  [5]  is  a  most  general 
optimization  technique.  The  only  "rules"  are  iterative  mutation 
and  selection.  In  its  standard  form,  the  basic  evolutionary 
program  uses  the  four  main  components  of  all  evolutionary 
computation  algorithms:  initialization,  variation,  evaluation 
(scoring),  and  selection  [6],  [7].  At  the  basis  of  this,  as  well  as 
other  evolutionary  computation  algorithms,  is  the  presumption 
that,  at  least  in  a  statistical  sense,  learning  is  encoded 
phylogenically  versus  ontologically  in  each  member  of  the 
population.  'Learning'  is  a  byproduct  of  the  evolutionary 
process  as  successful  individuals  are  retained  through  stochastic 
trial  and  error.  Variation  (such  as  mutation)  provides  the  means 
for  moving  solutions  around  on  the  search  space,  preventing 
entrapment  in  local  minima.  The  evaluation  function  directly 
measures  fitness,  or  equivalently  the  behavioral  error,  of  each 
member  in  the  population  with  regard  to  the  environment. 
Finally,  the  selection  process  probabilistically  culls 
suboptimal  solutions  from  the  population,  providing  an 
efficient  method  for  searching  the  topography. 

More  specifically,  evolutionary  programming  starts  with  a 
population  of  randomly  chosen  trial  solutions  (designs/plans  of 
action,  that  is,  alternative  ways  of  allocating  the  available 
resources  over  time).  These  may  include  suggestions,  but  they 
are  not  required.  Each  of  these  trial  solutions  is  evaluated  with 
regard  to  the  specified  fitness  function.  Only  some  are  retained 
to  serve  as  parents.  Each  parent  is  altered  through  application  of 
a  mutation  process.  Mutations  on  the  degrees  of  freedom  are 
chosen  with  respect  to  a  probability  distribution,  typically 
uniform.  The  number  of  mutations  per  offspring  is  also  chosen 
with  respect  to  a  probability  distribution  or  it  may  be  fixed  a 
priori.  These  offspring  solutions  are  then  evaluated  over  the 
existing  environment  in  the  same  manner  as  their  parents.  After 
the  fitness  or  behavioral  error  is  assessed  for  all  offspring,  the 
selection  process  is  performed  in  one  of  several  ways, 
depending  on  current  knowledge  of  the  response  surface.  In 
most  applications,  the  size  of  the  population  remains  constant, 
but  there  is  no  restriction  in  the  general  case.  The  process  is 
halted  when  either  the  solution  reaches  a  predetermined  quality, 
a  specified  number  of  iterations  has  been  achieved,  or  some 
other  criteria  (such  as  sufficient  convergence)  stops  the 
algorithm. 

Evolutionary  programming  differs  philosophically  from  other 
evolutionary  computational  techniques  such  as  genetic 
algorithms  in  a  crucial  manner.  It  is  purely  top-down  rather  than 
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bottom-up  optimization.  It  is  important  to  note  that  (according 
to  neo-Darwinism)  selection  operates  only  on  the  phenotypic 
expressions  of  a  genotype;  the  underlying  coding  of  the 
phenotype  is  only  affected  indirectly.  The  realization  that  a  sum 
of  optimal  parts  rarely  leads  to  an  optimal  overall  solution  is 
key  to  this  philosophical  difference.  Genetic  algorithms  rely  on 
the  identification,  combination,  and  survival  of  "good" 
building  blocks  (schemata)  iteratively  combining  these  to  form 
larger  "better"  building  blocks.  In  a  genetic  algorithm,  the 
coding  structure  (genotype)  is  of  primary  importance  as  it 
contains  the  set  of  optimal  building  blocks  discovered  through 
successive  iterations.  The  building  block  hypothesis  is  an 
implicit  assumption  that  the  fitness  is  a  separable  function  of 
the  parts  of  the  genome.  This  successively  iterated  local 
optimization  process  is  fundamentally  different  from 
evolutionary  programming  which  is  an  entirely  global 
approach  to  optimization.  Solutions/designs  (or,  by  analogy, 
organisms)  in  an  evolutionary  programming  algorithm  are 
judged  solely  on  their  fitness  with  respect  to  the  given 
environment.  No  attempt  is  made  to  partition  credit  to 
individual  components  of  the  solutions.  In  evolutionary 
programming  (and  in  evolution  strategies),  the  variation 
operator  allows  for  simultaneous  modification  of  all  variables  at 
the  same  time.  Fitness,  described  in  terms  of  the  behavior  of 
each  population  member,  is  evaluated  direcdy,  and  is  the  sole 
basis  for  survival  of  an  individual  in  the  population.  A 
crossover  operation  designed  to  recombine  building  blocks  is 
not  utilized  in  the  general  forms  of  evolutionary  programming. 

Evolutionary  programming  can  be  used  to  detect  unknown 
signals  in  noise  (discovering  templates  for  pattern 
recognition),  to  design  optimal  neural  networks  (even  when  the 
response  surface  is  discontinuous),  for  forecasting 
nonstationary  time  series  given  an  arbitrary  payoff  function, 
for  discovering  the  relationship  among  variables  (modeling  an 
unknown  transducer/plant),  for  the  control  of  that  plant  within 
the  limits  of  its  controllability,  and  for  gaming  if  that  plant  has 
its  own  purpose.  It  has  been  demonstrated  for  improving  drug 
design  (Agouron  Pharmaceuticals,  Inc.),  for  adaptive  control  of 
freeway  ramp  signals  (CalTrans),  for  scheduling  the  production 
of  clothing  (Levi  Strauss  &  Company),  for  the  distribution  of 
fuel  (Chevron),  for  aiding  the  detection  of  breast  cancer  (U.S. 
Army  Medical  Research  and  Materiel  Command),  and  for  various 
other  military  programs  that  concern  improved  training  and 
real-time  mission  planning. 

But  all  too  often  engineering  problems  as  posed  are  ill-defined. 
Are  the  stated  requirements  hard  or  soft  constraints?  If  soft,  how 
soft?  What  else  should  be  measured?  Are  all  these  measures 
equally  important?  Are  some,  or  all  of  them,  critical  (that  is, 
failure  in  this  regard  nullifies  the  value  of  any  other 
achievement)?  Are  there  degrees  of  criticality?  We  often  simply 
minimize  the  mean  squared  error  rather  than  refer  to  the  real 
payoff  matrix  wherein  equally  correct  outcomes  are  not  of  equal 
worth  and  equal,  but  opposite,  errors  have  different  costs. 
Clearly,  measures  of  effectiveness  (MOEs)  and  the  associated 
measures  of  performance  (MOPs)  do  not  tell  the  whole  story. 


Indeed,  effective  engineering  begins  with  a  clear  understanding 
of  what  must  be  achieved  by  when.  But  what  if  that  outcome  i  s 
not  realized?  Surely  some  value  is  found  in  lesser  degrees  of 
achievement.  There  are  even  times  when  our  primary  concern  i  s 
to  avoid  a  particularly  undesirable  situation.  In  other  words,  to 
be  meaningful,  we  must  identify  each  of  the  significantly 
different  futures,  all  the  way  from  Utopia  to  catastrophe,  for  only 
then  can  we  measure  the  overall  worth  of  any  given  situation. 

Unfortunately,  it  is  difficult  to  envision  these  significantly 
different  futures,  no  less  their  relative  worth.  The  Valuated  State 
Space  Approach  provides  a  convenient  way  to  overcome  this 
difficulty.  This  method  allows  the  responsible  individual  to 
portray  the  significantly  different  futures  in  terms  of 
preferentially  independent1  aspects  of  concern.  These  are  the 
dimensions  of  a  hyperspace,  each  being  attributed  some  relative 
importance  and  made  measurable  by  designating  mutually 
exclusive  class  intervals  that  indicate  those  differences  that 
make  a  difference  in  degree  of  achievement.  The  class  intervals 
exhaustively  span  the  range  of  each  parameter  from  the  most  to 
the  least  desirable  degree,  each  of  these  being  attributed  some 
value.  When  parameters  are  not  directly  measurable,  they  are 
expressed  in  terms  of  subparameters  that  are  measurable  or 
extended  to  still  lower  levels  in  the  hierarchy  that  include 
directly  measurable  subparameters.  Thus,  the  purpose  to  be 
achieved  takes  the  form  of  a  hierarchic  Valuated  State  Space 
together  with  a  normalizing  function  that  can  be  used  to 
combine  contributions  on  the  various  parameters  and 
subparameters  in  a  manner  that  reflects  their  degree  of 
criticality. 

In  many  situations  some  overall  worth  is  realized  if  there  is 
some  achievement  in  any  regard.  It  is  then  appropriate  to  use 
the  weighted  arithmetic  mean.  In  other  situations,  all  of  the 
parameters  may  be  critical.  Under  these  conditions,  the 
normalizing  function  takes  the  form  of  a  weighted  geometric 
mean,  that  is,  the  overall  worth  is  the  product  of  the  relative 
degrees  of  achievement,  each  raised  to  the  power  of  the 
normalized  importance  of  that  parameter.  If  all  the  achievement 
values  are  high,  the  weighted  arithmetic  mean  can  be  used  as  an 
optimistic  estimate  of  the  overall  worth,  for  the  difference 
between  these  means  becomes  negligible  at  successively  higher 
values  of  achievement  in  every  regard.  [6J.  Alternative  "what 
ifs"  can  then  be  scored  in  view  of  their  corresponding  futures. 
The  best  is  selected  and  there  is  a  list  of  the  remaining 
deficiencies,  by  priority.  The  worth  of  partially  resolving  any 
of  these  remaining  problems  can  easily  be  computed. 

But  organizations  operate  in  a  cooperative  or  competitive 
environment.  Our  best  move  may  not  be  best  if  it  significantly 
injures  a  friendly  organization  and/or  greatly  benefits  an 
adversary.  The  Valuated  State  Space  Approach  can  also  be  used 
to  express  the  presumed  purposes  of  the  other  players  so  that 


1  wherein  any  change  in  the  degree  of  achievement  on  a 
parameter  does  not  alter  the  relative  importance  of  the 
parameters. 


9 


their  best  moves  and  ours  can  be  estimated  in  light  of  their 
mutual  attitudes  and  the  state  of  the  game. 

Engineering  decisions  are  improved  when  the  purpose  is  made 
clear  and  concise,  for  means  have  value  only  in  terms  of  ends, 
evolutionary  programming  can  then  be  used  to  discover 
increasingly  appropriate  designs  and/or  modes  of  behavior  in 
view  of  the  presumed  dynamics/constraints. 

This  capability  has  recently  been  demonstrated  in  gaming. 
Effective  training  requires  an  interactive  adversary.  Having 
qualified  individuals  simulate  the  opposing  force  is  dangerous, 
for  it  presumes  an  understanding  of  their  culture  and  intent.  Such 
simulations  are  non-repeatable  and  cannot  be  calibrated.  It  is 
equally  dangerous  to  train  against  any  rule-based  enemy  (expert 
system),  for  the  real  enemy  is  intelligently  interactive,  learns, 
may  demonstrate  initiative,  and  is  likely  to  behave  in  a 
generally  unpredictable  manner,  by  generating  non-rule-based 
intelligently  interactive  adversaries  in  tank  platoon  combat 
simulation  for  the  sake  of  improved  training.  Non-rule-based 
intelligently  interactive  adversaries  were  generated  for 
simulating  tank  platoon  combat.  Three  series  of  experiments 
were  conducted  using  the  Modular  Semi  Automated  Forces 
(ModSAF)  simulation  program  that  embodies  the  terrain, 
platforms,  and  low-level  deconfliction  [8].  These  tank  warfare 
experiments  were  performed  using  a  Sparc  20  Work  Station  in 
real-time.  In  the  first  series,  opposing  tank  platoons  were 
situated  arbitrarily,  each  given  different  missions  in  the  form  of 
a  Valuated  State  Space  and  normalizing  function.  These  include 
the  weighted  probabilities  of  kill  and  survival,  as  well  as  the 
time  and  distance  remaining  to  reach  a  specified  goal-point.  In 
one  experiment,  both  platoons  had  unlimited  visibility  and 
equal  firing  range  of  1200  meters.  The  primary  concern  of  both 
platoons  was  survival.  Realistic  physical  constraints  were 
imposed,  for  example,  the  tanks  could  not  move  faster  than  25 
meters  per  second.  Tactical  decisions  were  made  every  20 
seconds  after  30  generations  of  mutation  and  selection,  the  best 
evolved  behavior  being  implemented.  In  this  manner,  there  was 
learning  across  successive  decision  points. 

The  platoons  moved  toward  their  goal-points  and,  in  this  case, 
toward  one  another.  They  stopped  just  before  reaching  the  firing 
range.  They  behaved  erratically  until  finding  themselves  in 
synchrony.  Then  took  a  path  toward  its  goal  that  remains 
outside  the  firing  range.  In  essence,  both  platoons  discovered 
"evasion"  .  .  .  without  understanding  this  concept.  In  another 
experiment  a  platoon  trades-off  the  worth  of  continuing  toward 
its  goal-point  as  opposed  to  pursuing  the  remaining  enemy 
tanks.  Throughout  these  experiments,  the  exhibited  behavior 
seems  rational. 

In  the  second  series  of  experiments,  one  platoon  was  provided 
information  concerning  the  mission  of  the  other  platoon.  This 
resulted  in  improved  performance.  In  the  third  series  of 
experiments,  the  same  platoon  was  given  misinformation  about 
the  enemy's  mission,  this  resulting  in  a  measurable  cost. 
Obviously,  the  decision  points  can  be  made  a  function  of  the 
combat  tempo.  The  less  time  between  the  decisions,  the  greater 


the  population  and  the  larger  the  number  of  generations  before 
each  decision,  the  greater  the  level  of  simulated  intelligence. 
Note  that  the  level  of  intelligence  and,  indeed,  the  mission  can 
be  changed  at  any  time  during  the  engagement.  This  combat 
simulation  can  be  extended  to  higher  levels  of  command 
wherein  diverse  platforms  are  interactively  controlled  in  a 
coordinated  manner.  In  essence,  here  is  a  simulation  of  co- 
evolution. 

This  non-rule-based  simulation  can  also  be  used  to  provide 
effective  decision  support,  for  evaluating  alternative  weapon 
systems  capabilities  and  the  tactics/doctrine  required  for  these 
systems.  In  fact,  it  may  be  used  on-line  to  generate  particularly 
worthwhile  tactics/doctrine  in  light  of  the  developing  situation. 
This  simulation  can  also  be  used  as  a  basis  for  measuring  the 
impact  of  soft  intelligence  factors  on  the  outcome  of  combat. 
To  what  extent  does  levels  of  training,  loyalty,  fatigue, 
cohesiveness,  and  so  forth  modify  force  effectiveness? 

3.  CONCLUSION 

Mathematics  has  long  been  the  primary  tool  of  engineering. 
Linear  approximations  yield  first  cut  solutions.  Far  more 
complex  representations  are  required  to  treat  actual  real-world 
problems.  This  formalism  requires  significant  computation.  We 
allocate  extensive  CPU  time  to  solve  differential  equations  that 
portray  time-dependent  interactive  dynamic  systems. 

But  computers  can  be  used  more  directly.  There  is  no  need  for  the 
intervening  mathematics.  The  machine  itself  becomes  a  fast- 
time  model  of  natural  evolution.  Rather  than  yield  a  single  best 
design,  it  provides  an  evolved  population  of  solutions,  each  of 
measured  worth.  The  engineer  can  review  these,  taking  into 
account  concerns  that  were  not  directly  stated  in  the  payoff 
function:  this  design  scored  well  but  may  have  undesireable  side 
effects  or  unintentional  consequences,  that  design  may  prove 
too  costly,  and  so  forth.  The  ultimate  design  is  selected  from 
those  that  have  survived  extensive  competition.  If 
conventional  designs  are  introduced  as  suggestions  and  are  of 
true  worth  they  will  be  so  recognized  by  having  survived  the 
selection  process,  evolutionary  computation  is  not  in 
competition  with  other  methods  of  optimization,  it  simply 
provides  a  way  of  improving  on  their  results.  In  essence,  it  is  a 
simulation  of  the  scientific  method  and,  therefore,  an 
embodiment  of  creativity. 

Of  course,  we  must  still  recognize  the  need  for  a  design  and 
express  that  need  in  concise  terms  for,  to  be  effective,  any 
optimization  technique  must  reference  an  explicit  scoring 
function.  But  ordinarily  ends  are  stated  in  terms  of  means.  The 
contractor  sees  a  new  building  in  terms  of  brick  and  mortar 
rather  than  the  efficiency  of  layout,  comfort,  cost  of 
maintenance,  longevity  of  service  and,  of  course,  ability  to 
withstand  environmental  extremes.  This  no  longer  need  be  the 
case.  We  now  have  a  convenient  formalism  for  representing 
purpose/intent/mission.  We  can  now  solve  complex  problems 
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using  top-down  engineering  by  coupling  the  Valuated  State 
Space  Approach  with  evolutionary  programming. 

But  there  is  more  to  do.  We  need  to  determine  the  fundamental 
capabilities  and  limitations  of  evolutionary  computation.  What 
is  the  simplest  way  to  generate  useful  heredity  through 
reproduction  in  a  finite  noisy  environment?  What  form  of 
evolutionary  computation  should  be  used  to  treat  which  class  of 
problems?  What  are  the  trade-offs  between  population  size, 
number  of  generations,  type  of  variation,  and  so  forth?  Are 
there  heuristics  for  selecting  a  most  suitable  representation  for 
any  given  problem?  Are  there  ways  to  estimate  the  convergence 
rate  under  different  conditions?  Under  what  conditions  is  it 
better  to  evolve  weighted  rules  rather  than  simply  direct 
combinations  of  the  available  resources? 

It  might  also  prove  worthwhile  to  examine  the  manner  in  which 
evolutionary  computation  relates  to  natural  evolution.  Is  it 
worthwhile  to  replicate  the  specific  mutation/selection  process 
that  takes  place  in  nature?  How  should  genetic  variation  be 
combined  with  phenotypic  selection?  Is  it  worthwhile  to 
replicate  natural  evolution  across  the  hierarchy  from  genes 
through  organisms  (their  organization  and  species)?  What 
theory  of  natural  evolution  (Lamarckian,  Darwinian,  and  so 
forth)  is  most  appropriate  for  addressing  certain  kinds  of 
problems?  Would  it  be  advantageous  to  combine  aspects  of 
these  different  theories? 

How  can  evolutionary  computation  be  made  still  more  efficient 
and  effective?  Is  it  worthwhile  to  use  asynchronous  evolution 
to  allow  some  of  the  better  parents  to  reproduce  across 
generations  and/or  to  artificially  alter  the  response  surface  (the 
adaptive  landscape)  to  facilitate  finding  the  global  optimum? 
How  can  evolutionary  computation  benefit  from  self-adaptation 
(internally  and/or  through  the  use  of  meta-levels)?  What  are  the 
limits  of  evolving  self-referential,  self-modeling,  self-aware, 
conscious  automata  that,  in  an  artificial  social  setting,  may 
even  exhibit  conscience? 

How  should  evolutionary  computation  be  realized  under  different 
conditions/constraints?  What  is  the  most  appropriate  language 
and  coding  procedure  and  why,  given  the  circumstance?  When 
and  how  should  distributed/parallel  processing  be  used  for 


evolutionary  computation?  When  is  it  advantageous  to  hardwire 
evolutionary  computation  into  a  chip?  These  and  other 
questions  are  worthy  of  serious  consideration. 

In  the  meantime,  there  are  diverse  applications  worthy  of  top- 
down  engineering. 
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EXTENDED  ABSTRACT 

The  world  is  full  of  large  and  complex  engineering  problems  that  we 
do  not  know  how  to  solve  effectively  or  efficiently.  Examples  range 
from  the  design  and  operation  of  communication  networks  and 
manufacturing  plants,  to  traffic  control  on  land,  sea,  and  in  the  air,  to 
military  C3I  and  logistic  management.  Such  discrete  event  systems 
permeate  the  modern  human-made  civilization.  Typically  they  do 
not  possess  simple  analytical  structures  compared  to  the  differential 
equation  based  physical  systems;  stochastic  effects  abound  in  such 
systems;  and  finally  their  designs  involve  search  in  a  huge 
parameter/configuration  space.  While  simulation  is  a  general 
purpose  performance  analysis  tool  of  choice,  fundamental  theoretical 
limitations  on  computation  rule  it  out  as  an  optimization  tool  for 
such  problems.  In  this  talk  we  advocate  a  strategic  re-direction  of 
this  general  optimization  problem  and  show  that  effective  in-road 
can  be  made  in  practice.  We  will  discuss  the  results  that  have  been 
achieved  so  far  and  the  problems  remain.  Our  thesis  also  advocates 
the  inclusion  of  this  optimization  tool  as  one  elements  of  the 
computing  intelligence  arsenal.  A  live  demonstration  of  the  tool  will 
be  conducted  during  the  talk  to  illustrate  its  generality  and  properties. 
A  separate  session  immediately  following  this  talk  also  addressed 
more  recent  developments. 

Additional  details  can  be  found  in  the  author's  web-site: 

http://hrl.harvard.edu/people/faculty/ho/CRCD 

and  the  following  references. 
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Abstract 

In  this  Lecture  we  are  going  to  pave  the  way  for 
the  quantum  theory  of  open  systems.  The  question 
of  completeness  of  description  of  physical  phenomena 
on  the  basis  of  quantum  mechanics  and  other  eter- 
nal questions  of  the  quantum  theory  are  considered. 
Applications  of  the  S-theorem  and  the  conservation 
law  of  Information  and  Entropy  are  demonstrated.  In 
this  domain  there  are  the  intersections  between  the 
Physics  of  Open  Systems  and  Semiotics. 

I.  Introduction 

As  and  in  the  classical  kinetic  theory,  in  the  quan- 
tum one  the  simplest  model  is  the  one-component 
rarefied  gas  of  structureless  particles  (the  Boltzmann 
gas).  This  model  it  is  possible  to  expand  in  two  main 
direction,  and  consider  two  more  complicated  models. 

The  first  one  is  the  rarefied  electron-ion  plasma, 
which  is  a  three-component  system  comprised  of  gases 
of  electron  and  ions,  and  electromagnetic  field. 

The  second  is  the  system  of  atoms  and  field,  which 
in  the  simplest  case  consists  of  two  components.  The 
gas  component  differs  from  Boltzmann  gas  in  that  the 
structure  of  atoms  or  molecules  is  taken  into  account. 
The  second  component  is  again  the  electromagnetic 
field. 

A  natural  generalization  of  these  two  particu- 
lar models  is  the  so-called  plasma-molecular  system, 
which  of  at  least  four  components  (Klimontovich- 
Wilhelmsson-Yakimenko-Zagorodnii  1989).  An  exam- 
ple of  such  system  is  provided  by  partially  ionized 
plasma.  Foundation  of  the  kinetic  of  systems  of  this 
kind  have  been  laid  in  (Klimontovich  1980  (1983); 
1982  (1986)  Klimontovich-Kremp-Kraeft  1981,  1987). 

A  consistent  description  of  plasma-molecular  sys- 
tems is  only  possible  within  the  framework  of  quan- 
tum theory;  hence  the  need  of  bridging  the  gap  be- 
tween the  classical  theory  of  open  systems  to  the  ap- 
propriate quantum  theory. 

This  is  best  done  with  the  concrete  examples  of  sim- 
ple but  real  systems,  such  as  the  system  of  noninter- 


acting  hydrogen  atoms  or  free  electrons  in  electromag- 
netic field.  We  shall  use  these  examples  to  illustrate 
the  transition  from  the  reversible  microscopic  oper- 
ator equations  to  irreversible  quantum  kinetic  equa- 
tions. Such  transition  may  be  interpreted  as  replace- 
ment of  system  of  particles  and  field  oscillators  by  a 
continuous  medium. 

In  particular,  this  implies  that  Schroedinger  equa- 
tion of  quantum  mechanics  for  deterministic  (not  op- 
erator) wave  functions  describes  the  evolution  of  con- 
tinuous medium,  but  ignores  the  dissipative  terms.  In 
this  sense,  there  is  an  analogy  between  Schroedinger 
equation  in  quantum  mechanics  and  Euler  equation 
in  hydrodynamics  (Klimontovich  1993,  1995). 

It  would  be  natural  to  begin  our  study  of  quantum 
system  of  atoms  and  field  with  the  microscopic  equa- 
tions, which  give  complete  quantum  electrodynamic 
description  of  atoms  and  electromagnetic  field. 

Like  in  the  case  of  Boltzmann  gas,  we  again  are 
faced  with  important  questions.  What  are  the  small- 
est scales  on  which  the  initial  equations  of  quantum 
electrodynamics  lose  their  reversibility?  What  is  the 
cause  of  irreversibility? 

Similar  question  arise  at  different  levels  of  descrip- 
tion. We  might  ask,  for  instance,  what  are  the  small- 
est scales  which  allow  going  over  from  operator  equa- 
tions of  quantum  electrodynamics  to  the  correspond- 
ing Schroedinger  equation  for  the  deterministic  distri- 
bution function  of,  let  us  say,  the  atom  of  hydrogen. 

The  fact  is  that  quantum  mechanical  Schroedinger 
equation  is  more  coarse  than  the  corresponding  equa- 
tion of  quantum  electrodynamics,  since  it  only  in- 
volves the  mean  field  rather  than  the  coordinates  of 
individual  field  oscillators. 

The  problem  consists  therefore  in  finding  the  start- 
ing point  of  transition  towards  irreversible  equations. 
In  this  connection  we  should  mention  simulating  in- 
fluence of  the  work  of  Ilya  Prigogine  (Prigogine  1980; 
Prigogine  -  Stengers  1984),  who  for  many  years  has 
been  studying  the  possible  generalization  of  the  sec- 
ond law  of  thermodynamics  to  the  microscopic  level. 
The  main  role  is  assigned  to  the  dynamic  instability 
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of  motion  of  microscopic  objects.  Dynamic  instability 
in  quantum  theory  is  manifested  in  significant  change 
of  wave  functions  when  the  relevant  initial  conditions 
are  varied  even  slightly. 

Recall  that  in  case  of  Boltzmann  gas  the  procedure 
of  smoothing  over  physically  infinitesimal  volume  (the 
"point"  of  continuous  medium),  which  is  necessary  for 
going  over  from  the  system  of  particles  to  the  approx- 
imation of  continuous  medium,  reflects  the  existence 
of  dynamic  instability  of  motion.  The  smoothing  over 
the  physically  infinitesimal  scales  in  the  construction 
of  irreversible  equations  of  statistical  theory  is  based 
on  the  existence  of  dynamic  instability  of  motion  of 
particles  of  system.  In  this  respect  the  dynamic  in- 
stability of  motion  plays  a  constructive  role  in  the 
statistical  theory. 

Since  the  theory  of  dynamic  instability  in  quantum 
theory  is  still  in  its  early  stage,  it  will  be  expedient 
to  start  with  the  evaluation  of  the  smallest  scales  on 
which  smoothing  is  possible.  This  is  necessary  for 
defining  the  limit  of  applicability  of  the  initial  micro- 
scopic equations  of  quantum  electrodynamics. 

The  question  of  completeness  of  description  of  phys- 
ical phenomena  on  the  basis  of  quantum  mechanics 
was  the  subject  of  the  famous  debate  between  Ein- 
stein and  Bohr  at  the  5th  Solvay  Conference  in  1927 
(see  Mehra  1975).  Later  this  problem  was  discussed  in 
the  paper  entitled  Can  quantum-mechanical  descrip- 
tion of  physical  reality  be  considered  complete?  (Ein- 
stein -  Podolski  -  Rosen  1935.,  and  also  in  the  papers 
by  de  Broglie  (1953)  and  Bohm  (1952).  The  issue 
of  "deficiency  of  quantum-mechanical  description"  is 
closely  related  to  the  problem  of  the  so-called  hidden 
parameters  in  quantum  mechanics. 

Most  physicists  argued  that  quantum  mechanical 
description  is  complete,  and  that  the  problem  of  hid- 
den parameters  does  not  exists.  This  view  was  based 
for  the  most  part  on  the  book  by  John  Neuman 
(1932),  in  which  he  proved  that  hidden  parameters  are 
incompatible  with  the  foundations  of  quantum  theory. 
It  was  not  mentioned,  however,  that  Schroedinger 
equation  of  quantum  mechanics  is  itself  approximate. 

Further  theoretical  and  experimental  studies  were 
simulated  by  the  results  of  Bell  (Bell  1965,  1966:  Be- 
linski,  Klyshko  1993),  who  formulated  the  condition 
of  existence  of  hidden  parameters  as  Bell's  inequal- 
ity. This  was  seen  as  a  new  possibility  of  experimen- 
tal verification:  if  quantum  mechanical  description  is 
complete,  then  Bell's  inequality  does  not  hold. 

In  a  recent  review  (Belinskii  -  Klyshko  1993)  we 
read: 

"The  problem  formulated  many  years  ago  by  Ein- 
stein, Podolski  and  Rosen,  by  Bohm  and  Bell,  still 
excite  the  new  generation  of  physicists.  To  a  large 
extent  this  is  due  to  the  fact  that  the  contradiction 


between  the  prediction  of  quantum  theory  and  the 
theory  of  hidden  parameters  can  be  settled  in  a  con- 
vincing way  (in  favor  of  quantum  theory,  of  course) 
by  experimentum  crucis,  unlike  most  other  quantum 
paradoxes.  The  theory  of  hidden  parameters  is  closely 
related  to  the  ensemble-statistical  interpretation  of 
quantum  theory,  and  this  is  the  reason  why  such  ex- 
periments (real  or  imaginary)  add  serious  evidence  to 
the  eternal  debate  between  the  advocates  of  statisti- 
cal (Einstein)  and  orthodox  (Bohr's  or  Copenhagen) 
interpretations,  and  their  numerous  modifications." 
And  in  the  next  paragraph: 

"It  is  yet  possible  that  in  the  future  this  debate 
will  be  settled  (perhaps  in  favor  of  a  third  way),  and 
historian  of  science  will  be  see  it  as  a  vivid  example 
of  fallacies  which  plagued  even  the  brightest  minds  of 
the  past" 

So  the  authors  do  note  rule  out  possibility  of  a  third 
way. 

Let  us  take  advantage  of  this  option.  Our  treatment 
will  be  largely  based  on  the  ideas  and  methods  de- 
scribed in  detail  in  (Klimontovich  1993,  1995,  1997). 
We  would  like  to  point  the  recent  paper  (Kadomtsev, 
1994)  in  which  the  principle  problems  of  quantum  the- 
ory are  discussed. 

II.  Microscopic  and  Macroscopic 
Schroedinger  Equations 

Imagine  a  quantum  system  which  consists  of  an 
ideal  gas  of  "particles"  (hydrogen  atoms,  as  example) 
and  fluctuation  electromagnetic  field. 

Microscopic  description  of  quantum  processes  in 
such  system  starts  with  the  reversible  dynamic  equa- 
tions for  the  particles  and  field.  There  two  possible 
approaches.  On  is  based  on  the  Schroedinger  equa- 
tion for  the  wave  function  of  complete  set  of  variables 
of  particles  and  field.  The  other  relies  on  the  equa- 
tion for  quantum  (operator)  wave  function  of  particles 
and  electromagnetic  field's  oscillators.  In  both  cases 
the  initial  dynamic  equations  give,  in  principle,  an  ex- 
haustive quantum  mechanical  description  of  time  evo- 
lution of  the  system  under  consideration.  Such  level 
of  description  corresponds  to  a  pure  ensemble. 

For  the  sake  of  simplicity,  we  shall  describe  interac- 
tion with  the  electromagnetic  field  in  the  dipole  ap- 
proximation. So  we  have  a  closed  set  of  equations 
for  operator  wave  function  <3>(r,  R,t)  of  electron  in 
hydrogen  atom,  and  field  operator  E~(R,t).  Instead 
the  ^(r,  R,  t)  operator  it  is  useful  to  use  the  quantum 
operator  f(r,p,t)  which  is  corresponding  to  Wigner 
quantum  distribution  function  f(r,p,t)  in  the  six- 
dimensional  phase  space. 

However,  owing  to  the  nonlinearity  of  operator 
equations,  averaging  over  the  Gibbs  ensemble  results 
in  a  very  sophisticated  system  of  meshing  equations  in 
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the  moments  of  different  order.  Accordingly,  we  once 
again  have  to  deal  with  the  problem  of  closure,  so  as 
to  obtain  a  closed  system  of  approximate  equations. 

In  quantum  mechanics  the  equation  for  operator 
wave  function  $(r,  R,t)  is  replaced  by  Schroedinger 
equation  for  the  deterministic  wave  function  ip(r,  R,  t) 
of  the  electron  in  a  hydrogen  atom  (fy(r,R,t)  — > 

In  quantum  mechanics  the  equation  for  operator 
wave  function  is  replaced  by  Schroedinger  equation 
for  deterministic  wave  function  of  the  electron: 


dip    n2  d2*p 


ih——  = 


dt      2m  dR2 


+  U(r)^  -  dE(R,t)ip; 


,  drdR 


1 


(1) 


This  equation  approximative  but  still  reversible. 
From  (1)  we  can  go  over  to  the  equation  for  deter- 
ministic "quantum  distribution  function"  in  Wigner's 
form 


f(r,p,R.t  =  (2) 


(2 


exp(— irp) — — — dr 


f  ,/       ^   ,  drdpdR       f ,   ,    „  .0drdR 


(3) 


Let  us  also  write  the  corresponding  kinetic  equation: 


df     df    „,„  .a/ 


(4) 


U(r  +  -hr) 


U(r  -  -hr) 


exp  [it(p  —  p)]  drdp  =  0 
which  in  the  classical  limit  becomes 


^L+V^L  +  E{R  t)^l-—^l  =  0  (5) 
dt        dr  '    dp      dr  dp 

We  see  that  Schroedinger  equation  (1)  for  determin- 
istic wave  function  corresponds  in  the  classical  limit 
to  the  kinetic  equation  for  the  one-point  distribution 
function  f(r,p,R.t).  Like  Schroedinger  equation,  this 


equation  is  reversible  since  it  does  not  take  into  ac- 
count the  dissipation  due  to  the  interaction  of  electron 
with  fluctuation  electromagnetic  field. 

Now,  to  facilitate  the  transition  to  the  irreversible 
equations<  let  us  refresh  some  points  from  Boltz- 
mann's  kinetic  theory. 

The  kinetic  Boltzmann  equation  differs  from  (5)  in 
that  it  includes  the  "collision  integral"  which  take  care 
of  the  dissipation  due  to  redistribution  of  the  parti- 
cle's velocities  because  of  collisions  between  them.  By 
r  and  /  we  denote  the  relaxation  parameters  -  the  free 
path  time  and  length.  We  also  introduce  the  charac- 
teristic parameters  of  the  problem  T  and  L. 

As  we  know  from  gas  kinetic  theory,  important 
are  two  extreme  cases,  corresponding  to  the  approxi- 
mation of  gas  dynamics  and  of  free  molecular  flow. 
The  gasdynamic  approximation  is  used  when  r  <C 
T,  I  <Zi  L.  Then  the  kinetic  Boltzmann  equation 
may  be  replaced  by  the  simpler  equation  for  the  gas- 
dynamic  functions  p(R,  t),u(R,  t),T(R,  t). 

In  the  opposite  extreme  case,  when  t  T,  I  3> 
L,the  dissipative  term  (the  "collision  integral")  in  the 
zero  approximation  may  be  dropped.  This  brings  us 
to  the  reversible  kinetic  equation  which  formally  co- 
incide with  (5). 

It  is  possible  to  exploit  this  analogy? 

Recall  first  of  all  that  the  nature  of  description  is 
changed  dramatically  when  we  go  over  from  the  Liou- 
ville  equation,  which  carries  all  information  about  the 
motion  of  particles  of  system,  to  the  Boltzmann  equa- 
tion. To  wit,  ^from  the  system  of  particles  whose  mo- 
tion is  described  by  the  reversible  Hamilton  equations, 
we  go  to  a  continuous  medium  in  six-dimensional 
space  of  coordinates  and  momenta.  Naturally,  the 
transition  to  the  approximation  of  continuous  medium 
is  associated  with  restriction  from  the  side  of  small 
scales  which  define  the  size  of  the  "point".  Since  the 
information  about  the  motion  of  particles  within  the 
"point"  is  lost,  the  equation  of  continuous  medium 
must  be  dissipative  and  therefore  irreversible. 

This  transition  is  necessitated  by  dynamic  instabil- 
ity of  motion  of  microscopic  elements  of  the  system. 
It  is  the  dynamic  instability  of  motion  combined  with 
uncontrollable  small  exertions  form  the  surrounding 
world,  that  makes  the  transition  to  irreversible  equa- 
tion inevitable. 

In  this  connection  it  is  worthwhile  to  recall  that  dis- 
sipation is  usually  seen  as  damping  of  motion,  scatter- 
ing of  energy,  loss  of  information.  At  nonequilibrium 
phase  transitions,  however,  which  may  result  in  the 
appearance  of  new  structure,  the  dissipation  plays  a 
constructive  role. 

This  explains  why  the  construction  of  dissipative 
equations  is  of  crucial  importance  for  the  statistical 
theory  of  open  systems.  The  first  step  in  this  direction 
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consists  in  the  revision  of  the  concept  of  continuous 
medium.  This  requires  given  a  concrete  definition  of 
physically  infinitesimal  scales. 

It  is  known  how  to  do  this  for  Boltzmann  gas  and  for 
a  plasma  (Klimontovich  1983,  1985,  1987).  How  then 
do  we  apply  this  knowledge  to  the  quantum  system 
of  atoms  and  electromagnetic  field. 

Let  us  return  to  the  classical  kinetic  equation  (5). 
For  the  Boltzman  gas,  for  example  it  can  be  used 
for  describing  free  molecular  flows.  The  approxima- 
tion of  continuous  medium  still  holds,  however,  since 
this  equation  does  not  carry  information  about  the 
motion  of  individual  particles.  This  restrict  the  ad- 
missible values  of  T  and  L  from  below:  they  must  be 
much  greater  than  the  6relevant  physically  infinitesi- 
mal scales: 


r  »  T  »  Tph,       I  »  L  »  I 


ph  ■ 


(6) 


And  yet,  equation  (5)  is  reversible,  because  it  corre- 
sponds to  the  zero  approximation  in  small  parameter 
L/l. 

What  is  the  situation  in  quantum  theory? 

III.  Continuous  medium  approximation  in 

QUANTUM  THEORY 

To  take  care  of  the  dissipation,  we  must  include 
into  consideration  the  interaction  of  atoms  with  fluc- 
tuating electromagnetic  field.  This  will  result  in  dissi- 
pative  kinetic  equation  for  the  quantum  distribution 
function  (or  density  matrix)  and  the  field. 

The  appropriate  "collision  integrals"  are  deter- 
mined by  the  small-scale  fluctuations  whose  charac- 
teristic scales  are  much  smaller  than  the  characteris- 
tic scales  of  the  kinetic  equations.  In  this  way,  the 
problem  of  the  structure  of  "continuous  medium"  is 
brought  up  explicitly  in  the  derivation  of  kinetic  equa- 
tions. 

This  problem  is  closely  associated  with  the  defini- 
tion of  kinetic  equations. 

In  the  pure  ensemble,  the  operator  density  matrix 
(for  example,  in  Wigner's  representation  (Klimon- 
tovich,1958,  1995;  Klimontovich,  Silin,  1960;  Brit- 
tin  and  Chapell,  1961;  Hillery,  O'Connell,  Scully, 
Wigner  1994)  is  expressed  via  product  of  operator 
wave  functions  ty(r,R,t).  Upon  transition  to  the  con- 
tinuous medium,  when  the  kinetic  equations  become 
irreversible,  the  pure  ensemble  is  replaced  by  "mixed" 
ensemble.  Then  there  is  no  representation  in  which 
the  density  matrix  can  be  expressed  exact  via  the 
product  of  wave  functions! 

It  might  seem  that  the  above  formulas  defy  this 
statement.  Indeed,  the  quantum  distribution  function 
(2)is  defined  as  product  of  wave  function  which  satisfy 
Schroedinger  equation  (1).  It  is  as  if  we  returned  to 
the  "pure  ensemble"  again. 


Naturally,  there  is  a  fundamental  difference  be- 
tween these  two  definition  of  pure  ensemble.  The 
first  exactly  renders  the  statistical  properties  of  quan- 
tum mechanical  description,  whereas  the  second  defi- 
nition of  pure  ensemble  corresponds  to  approximation 
of  continuous  medium.  The  approximation  amounts 
to  neglecting  the  dissipation  altogether. 

For  the  system  of  N  particles  this  approximation 
corresponds  to  Hartree  equation  in  quantum  me- 
chanics, or  to  self-consistent  Vlasov  approximation  in 
plasma  theory,  or,  at  last,  to  Euler  approximation  to 
hydrodynamics  which  disregards  the  dissipation  due 
to  viscous  friction  and  heat  conduction. 

In  order  to  define  the  structure  of  "continuous 
medium"  in  quantum  theory,  we  must  first  of  all  in- 
troduce the  characteristic  scales  of  the  system  in  ques- 
tion. The  characteristic  length  and  frequency  for  hy- 
drogen atom  are: 


ro  = 
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The  Bohr  radius  ro  is  the  characteristic  scales  of  the 
ground  state  distribution  |^(r)|2,  the  frequency  luq 
determines  the  energy  of  ground  state. 

Additional  parameters  arise  when  we  go  over  to  the 
dissipative  kinetic  equations.  We  divide  them  into  two 
classes. 

The  first  class  includes  those  parameters  which 
characterize  the  process  of  relaxation  towards  equi- 
librium. This  time  is  defined  by  Einstein's  coefficient 
for  spontaneous  transitions: 
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Einstein's  coefficient  is  proportional  to  the  coeffi- 
cient of  radiation  friction  at  the  transition  frequency. 
The  following  estimate  then  holds  good: 


Am  ~  n*ujnrn  «;  ujnm,    where  n  = 


he 


(9) 


is  the  "constant  of  fine  structure". 

At  the  zero  temperature  the  atoms  are  in  the 
ground  state,  and  the  density  of  electron's  position 
in  the  atom  is  isotropic.  The  most  probable  distance 
^from  the  nucleus  is  characterized  by  Bohr's  radius 

Now  let  us  try  to  answer  the  following  question: 
does  the  distribution  of  electron's  position  in  the 
ground  state  display  a  finer  structure?  What  are  the 
smallest  times  which  the  isotropic  distribution  takes 
to  become  established  in  the  ground  state? 
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It  is  obvious  that  the  structure  of  "continuous 
medium"  is  defined  by  the  scales  smaller  then  those  of 
(8).  In  classical  and  quantum  electrodynamics  there 
are  two  scales  which  are  smaller  than  Bohr's  radius 

ro- 


re  = 


mc 
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Ac  =  —  ~  Mro  <  r0,  where  /j  =  — . 
mc  Tic 

(10) 


The  first  is  the  "classical  electron  radius".  The 
main  contribution  to  the  effective  cross  section  of  scat- 
tering of  photons  by  free  electrons  (the  Tomson's  scat- 
tering) is  proportional  to  Tq  .  The  second  param- 
eter is  Compton's  length,  which  defines  the  shift  of 
wavelength  in  case  of  scattering  of  x-rays  by  free  elec- 
trons. Which  of  these  characterizes  the  beginning  of 
irreversibility,  and  thus  defines  the  finest  structure  of 
"continuous  medium"  the  size  of  "point"  in  quantum 
mechanics? 

Remark  that  the  study  of  the  structure  of  "contin- 
uous medium"  in  quantum  mechanics  is  stimulated, 
in  particular,  by  those  difficulties  which  is  associated 
with  the  attempts  to  calculate  quantum  fluctuations, 
like  fluctuations  of  velocity  of  free  electron  moving 
in  a  fluctuation  electromagnetic  field  (Klimontovich 
1990(1991),  1993,1995).  The  solution  of  this  problem 
gives  us  possibility  to  define  the  smallest  relaxation 
time  over  which  the  irreversibility  sets  up.  This  small- 
est relaxation  time  is  defined  by  the  classical  electron 
radius  r0  : 
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m{v2)  -  mu>l  (x2)  =  kTWo  =  h^Tiioo  coth 


(13) 


For  atom-oscillators  at  room  temperature  the 
coth  huJo/2kT  is  close  to  one,  and  the  amplitudes  i>o,£o 
are  therefore  defined  by  the  zero-point  energy  hu>o/2. 
So  we  have  the  following  expressions  for  the  frequency 
and  amplitude  of  atom-oscillators  which  are  in  equi- 
librium with  electromagnetic  field: 
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We  find  from  them  the  expressions  the  amplitude,  fre- 
quency and  velocity  of  atom-oscillator  (Klimontovich 
1995) 
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which  are  known  in  quantum  mechanics  as  Bohr's 
parameters.  Now  these  parameters  are  not  conse- 
quence of  Schroedinger  equation  for  atom-oscillators, 
but  they  represent  the  condition  of  equilibrium  be- 
tween atom-oscillators  and  electromagnetic  field. 

To  describe  the  distribution  in  the  space  (x,  v)  for 
a  gas  of  noninteracting  atom-oscillators  in  equilib- 
rium electromagnetic  field,  we  can  use  the  gener- 
alized Fokker-Planck  equation  (17.3.13)  in  (Klimon- 
tovich 1995)  for  the  distribution  function  f(x,v,t): 


Now  we  shall  show  that  the  scales  re ,  re  define  the 
fine  structure  of  the  atom's  ground  state. 

IV.  Establishment  of  ground  state  in 

ATOM-OSCILLATORS 

Consider  a  gas  of  noninteracting  atoms  in  equi- 
librium electromagnetic  field.  Following  to  Tomson 
model,  we  regard  the  atom  as  a  sphere  of  radius  ro 
which  carries  a  uniformly  distributed  positive  charge. 
The  electron  vibrates  with  respect  to  the  center  of  the 
sphere.  The  frequency  of  oscillations  is  found  from  the 
expression  for  the  elastic  force 


eE  =  — -r  =  —mu0r; 


u0 


mr" 


(12) 


For  the  sake  of  simplicity  we  consider  one- 
dimensional  oscillations  of  the  atom-oscillators.  In 
the  equilibrium  state  the  mean  kinetic  and  potential 
energies  are  defined  by  expressions: 


dt  dx 


df 


UJnX—  = 


dv 
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dv 
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Here  the  diffusion  and  friction  coefficients  in  velocity 
space  are  given  be  expressions: 


n     —  „,  kTu0 

V{v)  =  7" 


7 


2e2ujl 


m  3mc3  T 

The  spatial  diffusion  and  friction  coefficients  are 


(17) 


D{x)  =  kJ^  =  Tx2 


m7 


0' 


=  r 


(18) 


The  equilibrium  solution  of  this  equation  is  defined 
by  Planck's  formula: 
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mv2 /2  +  mu$x2 /2\ 
f{x,  v)  -  C  exp  (  —   )  ,  klWll  = 


fcT„0 
^cocoth^ 


(19) 


Now  we  can  show  the  that  relaxation  times  T(V),T(X) 
differ  very  much. 

Indeed,  the  corresponding  diffusion  times  are  de- 
fined by  expressions 


TW  =  ^7  =  i;     r(x)  =  7^-  =  ^«r(u).  (20) 


(„)  D{x) 


The  "strength"  of  inequality  <C  can  be  expressed  in 
terms  of  "fine  structure  constant"  fj,  =  e2 /he  : 


After  smoothing,  the  second  dissipative  term  in 
(E.15)  drops  out,  and  we  arrive  at  the  standard 
Fokker-Planck  equation  commonly  used  for  describ- 
ing the  Brownian  motion  of  harmonic  oscillator: 


dt  dx 


dv 


dv 
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(v) 
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+ 


d_ 
dv 


[ivf]  ■ 
(23) 

For  quantum  atom-oscillators  the  diffusion  and  fric- 
tion coefficients  are  defined  by  the  expressions  (17). 

We  see  that  in  this  equation  the  scale  on  which  the 
spatial  diffusion  takes  place  within  the  volume  of  a 
atom  (in  the  volume  of  a  ground  state)  is  a  "hidden 
parameter"  (or  "hidden  scales"! 

^From  the  last  relation  (21)  follows  inequality 
T(v)  ^  l/^o  allows  us  to  further  simplify  equation 
(23)  by  averaging  over  the  period  of  oscillations.  As 
a  result  we  have  the  following  equation: 


1 


1 


r(x)  ~  /i  r(v)     and  r(l)  ~  fi  — ;     r(w)  ~  -3— 


(21) 


We  see  that  the  spatial  diffusion  is  much  more 
"faster"  than  the  diffusion  in  velocity  space.  Thus, 
are  two  stage  ("fast"  and  'slow")  of  the  evolution  to 
equilibrium  distribution  (19). 

The  "fast"  process  of  evolution  characterizes  by  the 
time  of  spatial  diffusion  t/x)  =  x2/D(x)  —  1/T.  We 
see  that  the  quantum  mechanical  ground  state  is  es- 
tablished during  very  short  time  interval  ttx\  ~  re/c 
~  /x3(l/u;o)  which  much  less  of  the  period  of  Bohr 
oscillations. 

The  diffusion  time  for  "slow"  process  is  defined  by 
the  time  spontaneous  emission  Ttv\  ~  1/7(^0)  • 

Thus,  in  within  the  atomic  limits  the  spatial  dif- 
fusion is  the  fastest  process,  As  a  result,  Boltzmann 
distribution  with  respect  to  x  is  established  on  this 
scale  within  the  time  T(x)  ~  re/c  and  can  be  used 
as  the  smoothing  function  for  carrying  out  the  transi- 
tion to  the  smoother  function  f(x,  v,  t)  (Klimontovich 
1995). 

Marking  off  the  function  which  satisfies  (16)with 
tilde,  we  may  define  the  operation  of  smoothing  as 


f{x,v,t)=  /  f{x-x\x,t)F(x')dx'- 


F(x) 


\/2ttxI 


exp  - 


2x.q 


(22) 


We  see  that  the  scale  x0  ("Bohr's  radius")  defines  the 
size  of  "point"  of  quantum  continuous  medium  for 
system  is  considered,  and  scale  re  defines  the  corre- 
sponding physically  infinitesimal  time. 
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This  equation  describes  only  the  slowest  process  of 
relaxation  over  the  time  of  the  order  1/7  3>  1/  uQ. 

We  see  that  it  is  possible  to  describe  Brownian 
motion  of  quantum  atom-oscillators  with  a  varying 
degree  of  finesse.  The  most  detailed  information  is 
contained  in  the  Fokker-Planck  equation  (16),  where 
the  negligibly  small  ("hidden")  scales  are  less  than 
re ,  re .  The  least  detailed  information  is  contained  in 
the  Fokker-Planck  equation  (24) 

Other  example  of  Ground  State  Structure  it  is  pos- 
sible to  find  in  the  last  chapter  of  the  book  (Klimon- 
tovich 1995). 

V.  About  some  eternal  questions  of 

QUANTUM  MECHANICS 

We  saw  that  the  concept  of  "pure  ensemble"  is  clear 
defined  only  in  the  case  of  complete  quantum  mechan- 
ical description  system  of  atoms  and  electromagnetic 
field. 

In  quantum  mechanics,  however,  the  term  "pure 
ensemble"  is  used  also  in  those  cases  when  the  de- 
scription is  based  on  the  Schroedinger  equation  (l)for 
the  deterministic  wave  function  ijj(r,t)  of  only  of  the 
variables  of  particles.  This  approach  corresponds  to 
approximation  of  continuous  medium  and  when  the 
dissipation  is  not  taking  into  account.  On  this  ground 
it  is  impossible  to  describe  transitions  between  sta- 
tionary levels  accompanied  by  emission  of  radiation. 

Since  the  description  based  on  the  Schroedinger 
equation  is  not  complete,  there  exist  "hidden  param- 
eters" ("hidden  scales")  which  are  revealed  the  use  of 
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a  more  realistic  approach  to  problem  under  consider- 
ation. 

We  have  illustrated  this  possibility  with  a  concrete 
example  of  the  system  of  atom-oscillators  and  field. 
By  introducing  the  scales  of  "continuous  medium"  we 
were  able  to  take  into  account  the  small-scale  fluctu- 
ation in  quantum  Fokker-Planck  equation. 

The  calculation  of  small-scale  fluctuations  is  also 
of  interest  by  itself:  these  fluctuations,  for  instance, 
are  definitive  for  the  effective  cross  section  of  a  scat- 
tering of  photons  by  free  electrons  (Klimontovich 
1993,1995). 

It  is  important  that  the  effective  cross  section  in 
the  quantum  domain  (visible  and  x-ray  ranges)  does 
not  depend  on  Planck's  constant,  and  can  therefore  be 
found  from  classical  calculations.  The  relevant  scales 
re,re  are  so  small  that  Heisenberg's  uncertainty  rela- 
tion does  not  hold. 

In  this  way,  there  are  two  "exits"  from  quantum 
theory.  One  of  these  corresponds  to  the  domain  of 
large  scales  and  slow  a  spatial  variations.  The  other 
is  associated  withe  the  transition  to  the  scales  which 
are  much  smaller  than  any  of  the  quantum  scales  of 
the  system  in  question,  which  brings  us  into  the  realm 
of  "hidden  parameters"  ("hidden  scales"). 

The  second  question  of  this  section  which  was 
the  subject  of  the  famous  debate  between  Einstein 
and  Bohr  is  "Is  Quantum  Mechanical  Description 
Complete?"-  the  title  of  the  famous  paper  (Ein- 
stein,Podolski,  Rosen  1935). In  the  light  of  arguments 
developed  above,  the  answer  to  this  question  is  nega- 
tive. The  fact  is  that  quantum  mechanical  description 
based  on  the  reversible  Schroedinger  equation  can  al- 
ways supplemented  by  the  inclusion  of  fluctuations. 
Then,  owing  to  the  existence  of  fluctuation  dissipa- 
tion relations  the  dissipation  is  inevitable.  Conse- 
quently, any  kind  of  quantum  mechanical  description 
is  in  practice  incomplete,  and  the  concept  of  "pure 
ensemble"  is  just  an  abstraction. 

The  above  arguments  score  in  favor  of  Einstein's 
standpoint.  In  this  connection  it  would  be  interesting 
to  note  that  the  problem  of  incompleteness  of  quan- 
tum mechanical  description  had  been  actually  solved 
by  Einstein  long  before  the  emergence  of  quantum 
mechanics  as  such. 

As  early  as  1916  Einstein  formulated  the  concept 
of  spontaneous  emission  for  the  case  of  two-level  atom 
interacting  with  the  equilibrium  electromagnetic  field. 
This  term  emphasize  the  inevitability  of  energy  loss 
by  radiation  (Einstein  1916).  The  dissipation  thus  be- 
ing inseparable  from  real  processes.  The  Schroedinger 
equations  for  hydrogen  atom  and  other  mechanical 
systems  can  only  be  regarded  as  a  useful  idealization. 

At  this  point  we  would  like  to  quote  Ilya  Progogine, 
who  said  (Prigogine  1980,  p. 70): 


"...Or,  on  the  contrary,  The  should  we  argue  that 
nobody  has  ever  seen  an  atom  that  would  not  de- 
cay when  brought  into  an  exited  level?  The  physical 
"reality  then  corresponds  to  systems  with  continuous 
spectra<  whereas  standard  quantum  mechanics  ap- 
pears only  as  a  useful  idealization,  as  a  simplified  lim- 
iting case". 

Prigigine  does  not  belong  to  the  founding  fathers  of 
quantum  mechanics.  His  statement,  however,  closely 
echoes  the  words  of  Louis  de  Broglie. 

De  Broglie  expressed  doubt  in  the  completeness  of 
quantum  mechanical  description  even  at  the  heyday 
of  the  quantum  theory.  His  scepticism  was  not,  how- 
ever, shared  by  his  contemporaries,  and  so  he  also 
abandoned  this  attitude,  becoming  one  of  the  most 
brilliant  advocates  of  the  "Copenhagen"  formulation 
of  quantum  mechanics.  He  recalled  this  period  in  the 
following  words  (de  Broglie  1952): 

"Some  people,  remembering  that  I  abandoned  my 
first  attempts  and  used  the  interpretation  of  Bohr 
and  Heisenberg  in  all  my  works  for  twenty-fifes  years 
thereafter,  will  accuse  me  of  being  inconsistent  when 
they  see  that  I  am  again  doubtful  and  ask  myself 
whether  my  initial  orientation  had  been  right  after 
all.  Should  I  feel  like  joking,  I  can  reply  in  Voltaire's 
words,  "  Only  foolish  people  never  change  their  minds" 

"The  answer,  however,  can  be  more  prudent. 

"The  progress  of  science  is  continually  harassed  by 
the  tiranic  influence  of  certain  concepts  which  in  the 
course  of  time  have  become  dogmas.  Because  of  this, 
the  principles  which  have  been  recognized  as  final 
must  be  subject  to  most  thorough  revision". 

At  that  time  these  words  were  the  voice  of  one  cry- 
ing in  the  wilderness.  This  can  be  illustrated  with  a 
quotation  from  the  article  published  in  the  influential 
American  newspaper; 

"The  principle  of  uncertainty  has  eventually  made 
all  contemporary  physicists  (with  the  exception  of 
Dr. Einstein)  recognize  that  there  is  no  causality  or  de- 
terminism in  nature.  Dr. Einstein  in  majestic  solitude 
has  held  out  against  all  these  concepts  of  quantum 
theory" 

(The  New  York  Times,  30  March  1952). 

Yes  another  pertinent  passage  is  taken  from  Dirac's 
paper  published  shortly  before  his  death  (Dirac  1978): 

"I  think  it  might  turn  out  that  ultimately  Einstein 
will  prove  to  be  right,  because  the  present  form  of 
quantum  mechanics  should  not  be  considered  as  the 
final  form.  There  are  great  difficulties  (which  I  shall 
mention  later}  in  connection  with  the  present  quan- 
tum mechanics.  It  is  the  best  that  one  can  do  up  till 
now. 

But,  one  should  not  suppose  that  it  will  survive 
indefinitely  into  future.  And  I  think  that  it  is  quite 
likely  that  at  some  future  time  we  may  get  an  improve 


20 


quantum  mechanics  in  which  there  will  be  a  return 
to  determinism  and  which  will,  therefore,  justify  the 
Einstein  point  of  view. 

But  such  return  to  determinism  could  only  be  made 
at  the  expense  of  giving  up  some  other  basic  idea 
which  we  now  assume  without  question.  We  would 
have  to  pay  for  it  some  way  which  we  yet  cannot  guess 
at,  if  we  are  to  reintroduce  determinism." 

We  do  not  wish  to  comment  on  these  statements: the 
clarity  and  boldness  of  the  classics  can  only  be  ad- 
mired. Their  words  encourage  further  studies  in  the 
quantum  statistical  theory  of  open  systems,  which  will 
be  continued  in  the  next  part  of  this  Lecture.  We  tried 
to  define  the  author's  position  with  regard  to  the  fun- 
damental problems  of  quantum  theory.  I  shall  try  go 
on  by  this  way. 

VI.  Statistical  presentation  of  Heisenberg 

UNCERTAINTY  PRINCIPLE 

VI. A.  The  oscillatory  form  of  Heisenberg  relation 

As  well  known  from  the  text  books  on  quantum  me- 
chanics, the  Heisenberg  uncertainty  principle  follows 
from  a  inequality 
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L  dx 


dx 

T*0i 


1,  (25) 


Here  L  is  any  length  parameter. 

Let  us  f(x,p,t)  is  a  quantum  distribution  function 
-  Wigner  function  (2)  then  the  last  inequality  we  can 
present  in  the  following  form: 
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The  left  side  of  this  inequality  we  can  present  as 
mean  value  of  energy  for  harmonic  oscillator  with  the 
proper  frequency  is  defined  relation 
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Thus 


f  (mujkx2      p  \  r,        .dxdp      L  .  , 

J  {-h+k)"***™  ^  (28) 

This  means  that  the  mean  value  of  a  harmonic  os- 
cillators can  not  be  less  than  the  corresponding  zero 
energy 
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The  inequality  are  presented  here  it  is  possible  to 
write  in  the  followings  forms: 
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^From  this  inequality  the  Heisenberg  uncertainty  re- 
lation follows 


<*W>>T- 


(32) 


In  a  general  case  parameters  L  and  uq  have  arbitrary 
values.  For  the  sign  "="  these  parameters  are  not  - 
there  are  some  restrictions  on  its 


L2  =  —  =2(x2)  = 
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VLB.  The  sign  Distribution  functions 

For  the  sigh  "="  the  equation  (25)  has  the  following 
solution 
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The  corresponding  solution  of  the  equation  (26) 

it  is  possible  to  present  as  the  Wigner  distribution 
for  the  harmonic  oscillator  with  the  proper  frequency 
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f(x,p)  = 
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The  dispersions  (x2)  ,  (p2)  are  defined  by  relations 
(34). 

vii.  s-theorem  for  quantum  systems. 
Relative  ordering  of  states  "=",  ">  " 

VILA.  S -theorem 

Of  all  macroscopic  functions,  only  entropy  S  pos- 
sesses a  combination  of  properties  that  allow  it  to  be 
used  as  a  measure  of  uncertainty  in  the  statistical  de- 
scription of  processes  in  macroscopic  systems. 

Entropy  being  the  sole  function  with  properties  of 
a  measure  of  chaos,  there  is  but  one  option.  It  is  nec- 
essary to  redefine  entropy  so  that  the  average  energy 
remains  constant  in  the  course  of  evolution. 
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But  evolution  in  time  it  is  equally  possible  to  con- 
sider the  evolution  of  stationary  states  in  open  sys- 
tems at  slowly  changing  of  control  (governing)  param- 
eters. It  is  for  this  type  of  evolution  ,  that  criterion 
was  introduced  (Klimontovich  1983,  1984,  1988,1990 
(1981),  1995).  This  criterion  was  for  the  first  time  for- 
mulated for  specific  cases  (Klimontovich  1983,  1984) 
and  called  "S-theorem".  Later,  its  general  formula- 
tion was  suggested,  to  make  possible  to  direct  com- 
parison between  the  relative  degree  of  order  from  ex- 
perimental date  (Klimontovich  1988). 

Here  we  shall  consider  the  evolution  of  quantum 
states  corresponding,  accordingly,  to  the  sigh  "="  and 
"  >  "  in  Heisenberg  relation. 

In  general  case,  the  degree  of  order  of  the  distin- 
guished state  differs,  which  account  for  one  of  them 
being  more  chaotic  than  the  other.  Let  us  term  it 
"physical  chaos".  As  a  rule,  this  state  is  nonequilib- 
rium  and  more  ordered  than  the  equilibrium  state. 

Let  us,  by  assumption,  the  quantum  state  corre- 
sponding to  the  sign  "="  is  most  chaotic.  For  this 
state  the  quantum  distribution  function  fo(x,p)  is  de- 
termined by  the  expression  (36).  The  corresponding 
entropy 


To  change  the  mean  value  of  energy  we  introduce  some 
the  non  zero  temperature  T 
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Let  us  the  quantum  distribution  function  f(x,p,t) 
characterizes  any  nonstationary  state  with  the  sign 
">  "  in  the  Heisenderg  uncertainty  relation.  The 
quantum  distribution  function  f(x,p.t)  may  have  and 
negative  values,  but  the  corresponding  distribution 
functions  separately  for  coordinates  and  momenta  in 
any  cases  are  positive 
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The  mean  energy  for  this  state  is  defined  by  the  zero 
energy 
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^From  the  inequality  (29)  the  mean  more  then  this. 
But  according  the  S-theorem  to  determine  the  relative 
degree  of  order  it  is  necessary  to  compare  the  states 
at  equal  values  of  the  mean  energy  To  satisfy  this 
condition  it  is  necessary  to  replace  the  distribution 
function  by  the  renormalized  one 


fo(x,p)  -»  fo(x,p). 


(39) 


The  renormalized  distribution  function  is  also  Gaus- 
sian but  with  renormalized  values  (x2)  ,  (p2)  for  dis- 
persions 
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-J  f(p)lnf(p)^  =  S[x]  +  S\p}.  (43) 

To  find  the  necessary  value  of  temperature  T  we 
must  solve  the  equation 
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The  solution  of  this  equation 

T(t)  >  0 


(44) 


(45) 


therefore  the  choice  of  the  state  with  the  sign  "=" 
in  the  Heisenberg  uncertainty  relation  as  the  more 
chaotic  state  is  correct.  In  the  (44)  variable  t  for 
nonequilibrium  states  plays  the  role  of  parameter. 

Using  the  expression  (40)  for  the  renormalized  dis- 
tribution function  fo(x,p)  and  the  constancy  condi- 
tion (44)  for  the  average  energy  ,  the  expression  for 
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the  entropy  difference  of  states  with  signs  "=",  ">  " 
can  be  present  as  inequality 

S[x,p]  -  S[x,p]  =  S[x]  -  S[x]  +  Sp]  -  S[p]  = 

J  fix)   L     J  JKF  >      f(p)  27rh  ~ 

(46) 

Thus,  the  state  with  the  sign  "="  in  the  Heisenberg 
relation  is  the  most  chaotic.  The  last  expression  serves 
as  the  quantitative  measure  for  relative  degree  of  or- 
der any  quantum  state  -  stationary  or  nonstationary  - 
and  most  chaotic  state  which  corresponds  to  sign  "=" 
in  the  Heisenderg  uncertainty  relation. 

It  is  necessary  remember  that  the  oscillatory  model 
was  exploited  above  only  in  the  special  case  concern  of 
real  physical  oscillator.  The  model  was  considered  in 
much  more  general.  The  parameter  L  in  the  previous 
formulas  is  some  general  length  parameter.  If  L  is  the 
size  of  the  system  then  the  relation  L  and  ujo  allows  to 
use  "as  example"  the  oscillatory  model  for  description 
of  a  free  particle  motion  Kadotsev  1994;  Klimontovich 
1995,1997  (in  press)). 

VIII.  Information 

There  are  to  different  statistical  definition  of  Shan- 
non information.  The  first  one  coincides  with  defini- 
tion (by  form)  with  Boltzmann  definition  of  entropy. 
If  /(A')  is  any  dimensionless  distribution  function  of 
dimensionless  variable  A ,  then  the  Shannon  informa- 
tion (entropy  or  "5—  information")  is  defined  by  ex- 
pression (Haken  1988;  Kadomtsev,  1994) 

I[X]  =  S[X]  =  -  I  S(X)]nf(X)dX.  (47) 
or  for  discrete  variables 

I[n)  =  S[n]  =  -^Sn\nSn.  (48) 

n 

Although  in  many  cases  the  calculation  of  5—  in- 
formation is  certainly  useful,  it  does  not  reflect  the  ex- 
istence of  self-organization  in  open  systems  (Klimon- 
tovich 1990  (1991),  1995). 

To  define  the  changing  of  information  in  processes 
in  open  systems  it  is  better  to  use  more  general  defini- 
tion of  information  (Stratonovich  1975;  Klimontovich 
1982  (1986)) 

I[X\Y]  =  S[X]  -  S[X\Y].  (49) 
Here  is  S[X]  is  ordinary  Boltzmann  -  Shannon  entropy 


5[A]  =  -  J  S(X)\nS(X)dX  (50) 

and  5[A|y]  the  conditional  entropy.  It  is  connected 
with  the  conditional  distribution  function  S[Ar|Y] 
(S(X,Y)  =  S[X\Y]S(Y))  by  relation 

S[X\Y]  =  -  J  S(X,Y)lnS(X\Y)dXdY.  (51) 

The  expression  for  the  information  it  is  possible  to 
present  in  more  symmetrical  form 

I[X\Y]  =  J  \n  I^J^SiX^dXdY  >  0.  (52) 

We  see  that  is  defined  by  such  way  the  information 
is  positive.  The  equality  corresponds  to  the  case  when 
the  quantities  A,  Y  are  statistically  independent. 

IX.  Conservation  law  of  entropy  and 

INFORMATION 

Let  us  the  distribution  function  is  defined  com- 
pletely defined  be  the  first  moment 

S(Y)  =  5(Y-a).  (53) 

a  is  any  characteristic  parameter.  In  open  systems 
it  can  play  the  role  of  control  (governing)  parameter. 
Introduce  this  function  in  the  expressions  (51),  (50). 
After  integrating  over  Y  we  will  obtained  the  following 
expression  for  the  information 

I[X\a]  =  S[X]  -  S[X\a]  =  S[X}+ 

J  S(X\a)\nS(X\a)dX.  (54) 

If  the  nonconditional  entropy  S[X]  does  not  depend 
on  values  of  the  control  parameter  a, then,  using  the 
last  equality,  it  is  possible  obtain  the  corresponding 
relationship  between  the  information  and  the  entropy 
for  any  two  values  of  control  parameter 

/[A>i]  -  I[X\a2]  =  S[X\a2]  -  5[A|0l].  (55) 

^From  it  follows  the  conservation  law  for  the  sum  in- 
formation and  entropy 

I[X\a]  +  S[X\a]  =  const.  (56) 

We  see  that  for  the  cases,  when  the  nonconditional 
entropy  S[X]  does  not  depends  on  the  value  of  the 
control  parameter. 
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In  the  paper  (Kadomtsev  1994)  represents  the  con- 
servation law  as  sum  of  the  nonconditional  informa- 
tion and  entropy 


I[X]  +  S[X]  =  const 


(57) 


but  such  presentation  of  the  conservation  law  con- 
tradicts to  the  definition  (47)  of  the  nonconditional 
information  and  entropy. 

For  the  example  we  can  use  conservation  law  (56) 
for  quantum  systems.  Let  us  S[X]  — >  So[x,p]  is  the 
nonconditional  entropy  for  the  state  with  the  sigh  "=" 
in  the  Heisenberg  uncertainty  relation.  It  corresponds 
to  the  ground  state  of  the  system  at  the  temperature 
T  =  0.  Let  us  also  S[x,p]  is  the  renormalized  entropy 
for  the  state  with  the  sigh  "  ="  in  the  Heisenberg  un- 
certainty relation,  but  for  temperature  T  >  0,and,  at 
last,  S[x,p,t]  =  S[x,t]  +  S\p,t]  is  the  entropy  of  any 
nonequilibrium  of  stationary  states.  The  correspond- 
ing notation  will  be  used  and  for  conditional  informa- 
tion. Then,  in  accordance  with  inequality  (45)  ([in 
accordance  with  S—  theorem),  we  have  the  relation 

I[z,p,t]  -  I[x,p]  =  S[x,p]  -  S[x,p,t]  >  0.  (58) 

Thus,  at  the  transition  from  the  more  chaotic  state  to 
the  more  ordered  state  the  amount  if  the  information 
increase. 

As  and  in  the  classical  theory  the  5— theorem  it  is 
possible  now  to  use  for  the  diagnostic  of  quantum  sys- 
tems and,  thus,  to  solve  one  from  the  most  important 
problems  of  Semiotics  (Meystel,  1995;  Coombs, and 
Sulcoski  (Eds.)  1966). 

X.  References 

[1]  Belinski  A.V.,Klyshko  D.N.  1993:  The  Interfer- 
ence of  Light  and  the  Bell 's  Theorem  (in  Russian) 
Uspechi  Fiz.Nauk, 163,1. 

[2]  Bell  J.S.1965:On  the  Einstein-Podolski-Rosen. 
Physics, 1,165. 

[3]  Bell  J.  S.  1965:  On  the  Problem  of  Hidden  Variable 
in  Quantum  Mechanics.  Rev. Mod. Phys. 38,447. 

[4]  Brittin  W.E.,  Chapell  W.R.  1961.  The  Wigner 
Distribution  Function  and  Second  Quantization 
in  Phase  Space.  Rev  of  Mod.  Physics  34  620. 

[5]  Bohm  D.  On  the  possible  Interpretation  of  Quan- 
tum Mechanics  on  the  Basis  of  Concept  of  "Hid- 
den Parameters" ,  Phys. Rev. 85, 166, 180. 

[6]  Coombs  M.,  Sulcoski  M.  (Eds.)  1996.  Control 
Mechanisms  for  Complex  Systems.  Las  Cruces, 
New  Mexico. 

[7]  De   Broglie   L.1953:      La   Physik  Quantique 

Resterat-EUe  Indeterministe?  Paris. 
[8]  Dirac  P.A.M.  Directions  in  Physics.  John  Wiley 

and  Sons  New  York,  1978. 
[9]  Einstein   A.   1916:     Strahlung  Emission  und 

Absorption    naeh    der    Quantentheorie.  Ver- 

handl.Dtsch. Phys. Ges. 18,318. 


[10] 

[11 
[12 

[13 
[14 

[15 


[16 

[17 

[18 

[19 

[20 

[21 

[22 

[23 
[24 
[25 

[26 

[27 

[28 


Einstein  A.,Podolski  B., Rosen  N.1935:  Can 
Quantum-Mechanical  Description  of  Phys- 
ical Reality  Be  Considered  Complete? 
Phys.Rev.47,777. 

Haken  H.   Information   and  Self  organization. 
Springer  Heqdelberg,  Berlin,  New  York,  1988. 
Hillery  M.,  O'Connell  R.F.,  Scilly  M.O.,  Wigner 
E.P.   1994.  Distribution  functions  in  physics: 
Fundamentals.  Physics  Reports  106  122167. 
Kadomtsev  B.B.  1994:  Dynamics  and  Informa- 
tion. Uspechi  Fiz.  Nauk  164,449 
Klimontovich  Yu.L.  1958.  On  the  Method  of 
"Second  Quantization"  in  Phase  Space.  Soviet 
Physics  JETF  6  (33)  752. 
Klimontovich  Yu.L.  and  Silin  V. P.  1960  (1962). 
On  the  Spectra  of  Systems  of  Interacting  Parti- 
cles and  he  Collective  Losses  on  Passage  of  Parti- 
cles Through  Matter.  Uspechi  Fiz.  Nauk  70  247; 
Fortschr. Physik,  10  389. 

Klimontovich  Yu.L.  1982  (1986)  Statistical 
Physics.  Moscow  "Nauka,  1982;  Harwood  Aca- 
demic Publishers  New  York. 
Klimontovich  Yu.L.  1980  (1983)  The  Kinetic 
Theory  of  Electromagnetic  Processes.  Nauka 
Moscow;  Springer  Berlin,  Heidelberg,  New  York. 
Klimontovich  Yu.L.  1983  Entropy  Decrease  in 
the  Processes  of  Self-Organization,  (in  Russian). 
Pis'ma  v  ZhTF  9  1089. 

Klimontovich  Yu.L.  1984.  Entropy  and  Entropy 
Production  in  the  Laminar  and  Turbulent  Flows, 
(in  Russian). Pis'ma  v  ZhTF  10  80. 
Klimontovich  Yu.L. 1990, 1991  Turbulent  Motion 
and  Structure  of  Chaos.  Nauka,  Moscow;  Kluwer 
Academic  Publishers  Dordrecht. 
Klimontovich  Yu.L.  1993:  To  the  Statistical 
Ground  of  Schroedinger  Equation,  (in  Russian) 
TMF,  97,3.1995: 

Klimontovich  Yu.L.1995:Statisticai  Theory  of 
Open  Systems.  Nauka,  Moscow;  Kluwer  Aca- 
demic Publishers,  Dordrecht,  1995. 
Klimontovich  Yu.L.  1996:  Relative  ordering  crite- 
ria in  open  systems.  Uspechi  Fiz.  Nauk  166  1231. 
Klimontovich  Yu.L.  1997: To  Kinetic  Theory  of 
Collisionless  Plasma.  Uspechi  Fiz.  Nauk,  167,  23. 
Klimontovich  Yu.L.,Kremp  D.1981:Quantum  Ki- 
netic Equation  with  Bound  States.  Physica  A, 
109,512. 

Klimontovich  Yu.L.,Kremp  D.,Kraeft  W.1987: 
Kinetic  Theory  for  Chemically  React- 
ing Gases  and  Partially  Ionized  Plasmas. 
Adv.Chem.Phys.58, 175. 

Klimontovich  Yu.L.,  Wilhelmsson  H.,Yakimenko 
LP.,  Zagorodnii  A.G.  1989:  Statistical  Theory  of 
Plasma  Molecular  Systems.  Phys.Rev.175,  264. 
Mehra  J.  1875:     The  Solvay  Conferences  on 


24 


Physics.  D.Reidel,  Dordrecht,  Boston. 

[29]  Meystel  A.  1995  Semiotic  Modeling  and  Situation 
Analisis.  AfRem,  Inc. 

[30]  Neumann  J.  Mathematische  Grundlagen  der 
Quantummrchanik.  Springer,  Berlin. 

[31]  Prigogine  I.1980:From  Being  to  Becoming.  Free- 
man, San  Francisco. 

[32]  Prigogine  I.,  Stengers  I.  1984:  Order  out  of 
Chaos.  Heinemann,  London. 

[33]  Stratonovich  R.L.1975.  Theory  of  Information, 
(in  Russian).  Moscow  "Sov.  Radio". 


25 


Information  Granulation  and  its  Centrality  in  Human  and 

Machine  Intelligence 


Lotfi  A.  Zadeh 

Abstract 


In  a  general  setting,  granulation  involves 
a  partitioning  of  a  real  or  mental  object  into 
granules,  with  a  granule  being  a  clump  of  points 
(objects)  drawn  together  by  undistinguishability, 
similarity,  proximity  or  functionality. 

Granulation  is  crisp  or  fuzzy  depending  on 
whether  the  granules  are  crisp  or  fuzzy.  In  the  case 
of  age,  for  example,  the  time-intervals  {1,  2,  3,  ... 
,130}  are  crisp,  whereas  the  fuzzy  time-intervals 
{very  young,  young,  middle-aged,  old,  very  old} 
are  not. 

Granulation  --  and  especially  information 
granulation  (IG)  —  are  ubiquitous  in  human  actions 
and  cognition.  We  employ  granulation  when  we 
speak,  write,  eat,  walk  and  analyze  an  image;  we 
employ  fuzzy  information  when  we  partition  a 
human  body  into  body  parts:  head,  neck,  chest, 
arms,  legs,  etc.;  and  more  generally,  we  apply 
granulation  when  we  partition  a  complex  problem 
into  simpler  subproblems. 

Modes  of  information  granulation  in 
which  the  granules  are  crisp  (crisp  IG)  play 
important  roles  in  a  wide  variety  of  methods, 
approaches  and  techniques.  Among  them  are: 
interval  analysis,  quantization,  rough  set  theory, 
diakoptics,  divide  and  conquer,  Dempster-Shafer 
theory,  machine  learning  from  examples, 
chunking,  qualitative  process  theory,  decision 
trees,  semantic  networks,  analog-to-digital 
conversion,  constraint  programming,  cluster 
analysis  and  many  others. 

Important  though  it  is,  crisp  IG  has  a 
major  blind  spot.  More  specifically,  it  fails  to 
reflect  the  fact  that  in  much  —  perhaps  most  —  of 
human  reasoning  and  concept  formation  the 
granules  are  fuzzy  rather  than  crisp.  For  example, 
the  fuzzy  granules  of  a  human  head  are  the  nose, 
ears,  forehead,  hair,  cheeks,  etc.  Each  of  the  fuzzy 
granules  is  associated  with  a  set  of  fuzzy  attributes, 
e.g.,  in  the  case  of  the  fuzzy  granule  hair,  the  fuzzy 
attributes  are  color,  length,  texture,  etc.    In  turn, 


each  of  the  fuzzy  attributes  is  associated  with  a  set 
of  fuzzy  values.  Specifically,  in  the  case  of  the 
fuzzy  attribute  length(hair),  the  fuzzy  values  are 
long,  short,  not  very  long,  etc.  The  fuzziness  of 
granules,  their  attributes  and  their  values  is 
characteristic  of  the  ways  in  which  human  concepts 
are  formed,  organized  and  manipulated. 

In  human  cognition,  fuzziness  of  granules 
is  a  direct  consequence  of  fuzziness  of  the  concepts 
of  indistinguishability,  similarity  and 
functionality.  Furthermore,  it  is  entailed  by  the 
finite  capacity  of  the  human  mind  to  store 
information  and  resolve  detail.  In  this  perspective, 
fuzzy  information  granulation  (fuzzy  IG)  may  be 
viewed  as  a  form  of  loose  data  compression. 

Fuzzy  information  granulation  underlies 
the  remarkable  human  ability  to  make  rational 
decisions  in  an  environment  of  imprecision, 
uncertainty  and  partial  truth.  And  yet,  despite  its 
intrinsic  importance,  fuzzy  information  granulation 
has  received  scant  attention  except  in  the  context  of 
fuzzy  logic,  in  which  fuzzy  IG  underlies  the  basic 
concepts  of  linguistic  variable,  fuzzy  rule-set  and 
fuzzy  graph.  In  fact,  the  effectiveness  and 
successes  of  fuzzy  logic  in  dealing  with  real-world 
problems  rest  in  large  measure  on  the  use  of  the 
machinery  of  fuzzy  information  granulation.  This 
machinery  is  unique  to  fuzzy  logic. 

Recently  fuzzy  information  granulation 
has  come  to  play  a  central  role  in  the  methodology 
of  computing  with  words  (CW).  More 
specifically,  in  a  natural  language  words  play  the 
role  of  labels  of  fuzzy  granules.  In  computing 
with  words,  a  proposition  is  viewed  as  an  implicit 
fuzzy  constraint  on  an  implicit  variable.  The 
meaning  of  a  proposition  is  the  constraint  which  it 
represents. 

In  CW,  the  initial  data  set  (IDS)  is 
assumed  to  consist  of  a  collection  of  propositions 
expressed  in  a  natural  language.  The  result  of 
computation  —  referred  to  as  the  terminal  data  set 
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(TDS)  --  is  likewise  a  collection  of  propositions 
expressed  in  a  natural  language.  To  infer  TDS 
from  IDS  the  rules  of  inference  in  fuzzy  logic  are 
used  for  constraint  propagation  from  premises  to 
conclusions. 

There  are  two  main  rationales  for 
computing  with  words.  First,  computing  with 
words  is  a  necessity  when  the  available 
information  is  not  precise  enough  to  justify  the 
use  of  numbers.  And  second,  computing  with 
words  is  advantageous  when  there  is  a  tolerance  for 
imprecision,  uncertainty  and  partial  truth  that  can 
be  exploited  to  achieve  tractability,  robustness, 
low  solution  cost  and  better  rapport  with  reality. 
In  coming  years,  computing  with  words  is  likely 
to  evolve  into  an  important  methodology  in  its 
own  right  with  wide-ranging  applications  on  both 
basic  and  applied  levels. 

Inspired  by  the  ways  in  which  humans 
granulate  human  concepts,  we  can  proceed  to 
granulate  conceptual  structures  in  various  fields  of 
science.  In  a  sense,  this  is  what  motivates 
computing  with  words.  An  intriguing  possibility 
is  to  granulate  the  conceptual  structure  of 
mathematics.  This  would  lead  to  what  may  be 
called  granular  mathematics.  Eventually,  granular 
mathematics  may  evolve  into  a  distinct  branch  of 
mathematics  having  close  links  to  the  real  world. 


A  subset  of  granular  mathematics  and  a  superset  of 
computing  with  words  is  granular  computing. 

In  the  final  analysis,  fuzzy  information 
granulation  is  central  to  human  reasoning  and 
concept  formation.  It  is  this  aspect  of  fuzzy  IG 
that  underlies  its  essential  role  in  the  conception 
and  design  of  intelligent  systems.  What  is 
conclusive  is  that  there  are  many,  many  tasks 
which  humans  can  perform  with  ease  and  that  no 
machine  could  perform  without  the  use  of  fuzzy 
information  granulation,  This  conclusion  has  a 
thought-provoking  implication  for  AI:  Without 
the  methodology  of  fuzzy  IG  in  its 
armamentarium,  AI  cannot  achieve  its  goals. 
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Abstract 

This  paper  describes  the  role  of  semiotic  descriptors  in 
representation  of  knowledge  by  relational  structures. 

Fuzzy  relations  can  manipulate  semantic  information 
that  is  carried  by  linguistic  labels.  The  conditions  imposed 
on  logic  operations  applied  to  fuzzy  relational  systems  can 
be  strictly  mathematically  defined.  But  the  logic  seman- 
tic is  not  sufficient  to  deal  satisfactorily  with  the  meaning 
of  linguistic  labels  that  carry  the  conceptual  meaning  of 
applications.  It  has  to  be  supplemented  by  the  notion  of 
semiotic  descriptors.  These  can  be  expressed  as  alge- 
braic restrictions  over  the  basic  fuzzy  relational  system  [5]. 

Semiotic  Fuzzy  Knowledge  Representation  Structure 
consists  of  the  pair  of  structures,  namely  <  FRS,  SD  >. 
FRS  is  a  Fuzzy  Relational  Structure  consisting  of  a  family 
of  fuzzy  relations,  and  SD  is  collection  of  semiotic  descrip- 
tors [4]. 

After  describing  the  general  semiotic  model,  we  shall 
also  show  a  specific  application  -  use  of  Semiotic  Fuzzy 
Knowledge  Representation  approach  in  study  of  cost  and 
affordability  in  engineering  design.  Specific  examples  to  be 
presented  are  taken  form  the  domain  of  aeronautic  indus- 
try. BK-products  of  relations  [7]  and  fast  fuzzy  relational 
algorithms  [1]  are  the  technical  tools  by  which  we  extract 
meaning  form  the  answers  to  specific  questions  presented 
to  engineers  in  our  case  study. 


1  Introduction 

This  paper  describes  the  role  of  semiotic  descriptors 
in  representation  of  knowledge  by  relational  structures. 
Fuzzy  relations  can  manipulate  semantic  information  that 
is  carried  by  linguistic  labels.  The  conditions  imposed  on 
logic  operations  applied  to  fuzzy  relational  systems  are 
strictly  mathematically  defined.  But  the  logic  semantic 
is  not  sufficient  to  deal  satisfactorily  with  the  meaning  of 
linguistic  labels  that  carry  the  conceptual  meaning  of  ap- 
plications.  It  has  to  be  supplemented  by  some  semiotic 


notions  that  can  be  expressed  as  algebraic  restrictions  over 
the  basic  fuzzy  relational  system. 

Semiotic  Fuzzy  Knowledge  Representation  Structure 
consists  of  the  pair  of  structures,  namely  <  FRS,  SD  >. 
FRS  is  Fuzzy  Relational  Structure  consisting  of  a  family  of 
fuzzy  relations,  and  SD  is  collection  of  semiotic  descriptors 
[4],{5]. 

Semiotic  descriptors  are  obtained  by  exploratory  knowl- 
edge elicitation  [6].  We  use  repertory  grids  to  elicit  the 
meaning  as  used  by  human  experts  [3]  Once  the  relevant 
semiotic  descriptors  are  identified,  relationships  between 
them  can  be  captured  by  repertory  grids  using  bi-polar 
constructs,  each  construct  consisting  of  a  pair  of  semiotic 
descriptors. 

After  the  repertory  grids  are  applied  to  a  particular 
problem,  and  the  semantic  relationship  captured  by  the 
grids,  the  grids  are  transformed  into  fuzzy  relations  which 
relate  the  semiotic  descriptors. 

Relational  methods  of  analysis  are  then  applied  in  or- 
der to  discover  meaningful  conceptual  structures  implicit 
in  these  fuzzy  relations.  This  is  done  by  forming  fuzzy 
relational  products  and  further  processing  thus  obtained 
composed  relations  by  algorithms  computing  closures  and 
interiors  of  fuzzy  relations. 

Particularly  useful  structural  relationships  are  equiva- 
lences, similarities  and  preorders  between  individual  semi- 
otic descriptors  captured  by  the  bi-polar  repertory  grids. 
These  structural  properties  relating  the  meaning  of  con- 
cepts intrinsically  contained  in  the  data  captured  by  reper- 
tory grids  must  be  made  explicit  by  appropriate  relational 
computations.  The  computational  algorithms  for  this  pur- 
pose are  based  on  BK-Products  of  fuzzy  relations  and  Fast 
Fuzzy  Relational  Algorithms  [8]  computational  tools  are 
used  to  identify  relational  structures  and  properties  intrin- 
sically contained  in  data. 

After  describing  the  general  semiotic  model,  we  shall 
also  show  a  specific  application  -  use  of  Semiotic  Fuzzy 
Knowledge  Representation  approach  in  study  of  cost  and 
affordability  in  engineering  design.  Specific  examples  to 
be  presented  are  taken  form  the  domain  of  aeronautic  in- 
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dustry. 

A  specific  example  on  which  we  demonstrate  our  meth- 
ods will  be  knowledge  elicited  from  human  experts  char- 
acterising an  ingot  process  of  the  Lower  Pressure  Turbine 
(LPT)  Cover  Plate.  The  data  of  this  example  is  evaluated 
by  means  of  fuzzy  relational  subproducts  with  respect  to 
the  structural  relationships  of  cost  factors.  Crucial  to  this 
approach  is  the  use  of  semiotic  descriptors  that  allow  for 
linguistic  interpretation  of  relational  computations.  These 
descriptors  potentially  denote  the  technological  and  fiscal 
entities  and  concepts  that  are  relevant  to  the  problem  of 
affordability. 

2    Semiotic  Descriptors,  Granularity  and 

Meaning 

2.1  Semiosis  and  Interpretation 

The  main  goal  of  semiosis  is  interpretation  of  signs  and 
symbols  -  recognition  of  their  meaning.  The  meaning  in- 
terpretation unit  is  a  very  complicated  one.  The  mean- 
ing of  something  can  be  different  depending  on  the  scale 
which  is  used  for  representation.  This  makes  the  notion 
of  granularity  a  very  important  one.  Levels  of  granularity 
also  referred  to  as  levels  of  resolution  are  closely  related  to 
generalization. 

Interpretation  and  meaning  depend  on  context.  A  pro- 
cess considered  with  all  its  details  cannot  be  properly  un- 
derstood unless  the  details  irrelevant  within  a  particular 
context  or  perspective  are  brushed  away.  But  this  inter- 
pretation will  often  be  incomplete.  With  incompleteness 
at  work,  there  may  be  more  than  one  interpretation.  In- 
deed, a  whole  family  of  interpretations  may  be  possible, 
some  of  which  may  be  conflicting.  Fuzzy  sets,  relations 
and  logic  can  play  an  important  role  here:  they  allow  us, 
through  the  theory  of  potentiality  (or  virtual  plurality)  to 
deal  with  the  whole  family  of  virtual  outcomes,  and  also 
to  measure  the  degree  of  conflict  of  individual  members 
of  some  possiblistic  family  of  outcomes  produced  by  the 
meta-process  of  interpretation. 

2.2  Object  Emergence 

The  phenomenon  of  object  emergence  is  linked  with  for- 
mation of  crisp  and  fuzzy  classes  (generalized  groupings). 
Indeed,  here  we  deal  with  the  logic  theory  of  fuzzy  re- 
lations, which  can  be  used  to  expose  the  inadequacy  of 
currently  used  logical  structure  of  crisp  (i.e.  non-fuzzy  ob- 
jects). Logically,  generalization  is  a  process  in  which,  by 
means  of  grouping  together  relevant  properties  (intensional 
specifications)  of  objects,  we  create  new  objects  -  struc- 
tures (given  as  such  by  extension)  by  new  intensional  spec- 
ification. So  OOP  objects  can  be  viewed  as  special  cases 
of  the  pragmatics  of  groupings,  the  semantics  of  which  is 


given  by  many-valued  logic  based  relations  with  special 
meta-properties. 

2.3    Triadic  Distinction  of  Semiotics:  Syntax,  Semantics 
and  Pragmatics 

One  of  the  classic  notions  of  semiotics  is  the  triadic  dis- 
tinction: syntax  -  semantics  -  pragmatics  coined  by  Morris 
in  late  1930s.  That  is  the  place  where  the  duality  Semiotics 
-  Mathematics  comes  in.  In  logic  methods  of  proofs,  only 
the  "form",  the  syntactic  composition  plays  the  role.  In 
the  so  called  logic  theory  of  models,  the  primary  goal  is  to 
interpret  syntax  in  semantic  meta-structures.  But  what  we 
need  here  is  the  fully  fledged  duality:  syntax  -  semantics, 
and  the  pragmatics  of  emergence  of  either  of  these.  Here 
the  fuzzy  logic  can  play  an  important  role:  we  have  the  du- 
ality of  linguistic  descriptors  and  fuzzy  structures  to  which 
these  descriptors  apply.  We  also  have  the  duality  symbolic 
vs.  numerical  that  are  both  addressed  by  what  Zadeh  calls 
"the  fuzzy  logic  in  wider  sense".  "Numerical"  in  our  re- 
lational setting  will  be  extended  to  the  many-valued  logic 
valuation  of  relational  structure,  and  "symbolic"  will  be 
represented  by  semiotic  descriptors,  which  are  special  in- 
stances of  linguistic  labels. 

Now,  we  have  to  provide  the  link  of  Semiotic  Fuzzy 
Knowledge  Representation  Structure  (SFKRS)  with  the 
semiotic  triangle.  Semiotic  Fuzzy  Knowledge  Represen- 
tation Structure  consists  of  the  pair  of  structures,  namely 
<  FRS,  SD  >.  FRS  is  Fuzzy  Relational  Structure  consist- 
ing of  a  family  of  fuzzy  relations,  and  SD  is  collection  of 
semiotic  descriptors  [4], [5].. 

The  semiotic  triangle  has  three  vertexes,  namely  name, 
meaning  and  represented  object.  The  vertex  "name"  of  the 
triangle  maps  into  a  semiotic  descriptor  SD,  while  the  ver- 
tex "meaning"  is  represented  by  a  Fuzzy  Relational  Struc- 
ture (FRS).  The  vertex  "object"  maps  into  the  object  of 
the  real  world  characterisation  of  which  is  captured  by  the 
pair  <FRS,  SD  >  of  SFKRS.  See  Figure  1  for  more  further 
details. 

3    Semiotic  Relational  Model 

3.1    Representing  Activites  in  the  Domain  of  Manufactur- 
ing in  the  Aeronautic  Industry 

In  this  second  part  of  our  paper,  we  look  at  a  specific 
relational  model  from  the  domain  of  manufacturing  in  the 
aeronautic  industry.  First  we  give  an  overview  of  the  global 
relational  structure  in  terms  of  semiotic  descriptors  SD. 
This  is  followed  by  a  section  that  shows  how  the  Semi- 
otic Fuzzy  Knowledge  Representation  Structure  SFKRS 
(which  consists  of  a  pair  of  structures,  namely  <  FRS,  SD 
>)1  is  further  completed  by  eliciting  the  fuzzy  member- 

1  We  have  noted  previously  that  FRS  is  a  Fuzzy  Relational  Struc- 
ture consisting  of  a  family  of  fuzzy  relations,  and  SD  is  collection  of 


32 


[Name  Space] 

Semiotic  Descriptor 


Meaning 
within  a  perspective 
or  a  context 


Object 
with  all  its 
relationships 

[Physical  Reality] 


[Conceptual  Characterization] 

Figure  1.  Semiotic  Triangle 

ship  values  of  FRS  by  repertory  grids.  Repertory  grids  are 
presented  to  human  experts,  in  this  case  engineers  who  fill 
the  grids.  The  grids  are  the  turned  into  fuzzy  relations 
which  are  further  processed  by  relational  computations  by 
which  the  meaning  within  a  context  is  extracted. 

We  have  noted  previously  that  FRS  is  Fuzzy  Relational 
Structure  consisting  of  a  family  of  fuzzy  relations,  and  SD 
is  collection  of  semiotic  descriptors. 

3.2    Conceptual  Categories  of  Semiotic  Descriptors 

The  relational  model  presented  in  this  case  study  is  in- 
tended for  the  capturing  and  representing  the  features  rele- 
vant to  the  affordability  analysis  and  prediction.  It  cosists 
of  the  following  conceptual  categories  of  relations  : 

•  Objects, 

•  attributes, 

•  values, 

•  agents, 

•  perspectives, 

•  contexts, 

•  views. 

Each  conceptual  category  listed  above  represents  a  spe- 
cific level  of  granularity.  Each  level  of  granularity  may 
posses  conceptual  refinements  that  are  interlinked  with 
other  granular  structures.  Each  conceptual  category  has 
specific  meaning  that  can  be  understood  and  interpreted 
by  a  domain  expert. 

Objects  are,  for  example,  either  parts  of  a  manufac- 
tures product,  manufactured  products  or  whole  technolo- 
gies, depending  on  the  resolution  level. 


semiotic  descriptors. 


Attributes  are  characterised  by  linguistic  descrip- 
tors. Examples  of  these  are:  low_raw_material_cost, 
small-processing_windows,  high_temperature, 
goodJubricity,  low_varianceJn_raw_material_costs,  etc. 
But  attributes  can  also  be  characterised  by  measurable 
physical  or  fiscal  parameters,  such  as: 
temperature,  lubricity,  cost_reducing_potential,  poten- 
tialinvestment,  cost,  etc. 

Interactions  (special  kinds  of  relations),  for  example: 
REL_1,5:  low_variance_in_raw_material_costs  — ► 

low_cracking_probability 

REL_2,3:  good_processing_control  — - 

low_raw_material_cost 

REL_3,7:  low_raw_material_cost  — ► 

common_standard_material_alloy. 

Values.  These  are  either  linguistic  variables  or  numer- 
ical variables  determining  the  degree  to  which  an  object 
possesses  an  attribute. 

Agents.  In  the  context  of  this  project,  agents  are 
the  observers  (e.g.  engineers  or  accountants)  assessing  the 
degree  to  which  an  attribute  is  possessed  by  an  object. 
For  example,  in  [R4],  [R8],  [RIO]  describing  the  evaluation 
of  an  LPT  cover  plate  the  observers  were  engineers  eval- 
uating to  what  degree  various  attributes  can  be  assigned 
to  the  LPT  plate.  Perspectives.  An  object  or  a  fam- 
ily of  objects  can  be  evaluated  within  different  perspec- 
tives. For  example,  an  LPT  cover  plate  can  be  evaluated 
from  the  perspective  of  an  engineer,  or  from  the  perspec- 
tive of  a  business  analyst  performing  value  analysis  of  the 
part,  or  from  the  perspective  of  an  accountant.  Each  per- 
spective may  employ  attributes  that  are  different  from  the 
attributes  of  a  different  perspective  for  the  same  object. 
Some  attributes  may,  however,  be  shared  by  different  per- 
spectives. 

Contexts.  Each  object  or  family  of  objects  can  appear 
in  several  different  contexts.  For  example  an  LPT  cover 
plate  may  appear  e.g. in  context  of  ingot  process,  forging 
process,  extrusion  process,  or  other  processes. 

Views.  Even  in  one  particular  perspective  or  context, 
different  experts  may  assess  the  objects  and  situations  in 
which  objects  appear  differently.  These  differences  of  views 
of  different  experts  can  be  captured  by  repertory  grids  and 
compared  by  relational  methods  using  algorithms  provided 
by  TRYSIS. 

4    A  Case  Study:  Low  Pressure  Turbine 
(LPT)  Cover  Plate 

4-1    Semantics  of  the  Relationships  of  the  Relational 
Model  of  LPT  Cover  Plate 

4.1.1     Semiotic  Descriptors  of  Relational  Products 

In  representing  knowledge  structures,  not  only  quantita- 
tive but  also  qualitative  notions  are  involved.  Product- 
relations  formed  by  the  relational  products  represent  new 
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entities  composed  from  the  original  data.  Each  relations 
possesses  meaning  carried  by  its  linguistic  label.  This 
label,  which  is  in  fact  an  instance  of  semiotic  descrip- 
tors carries  a  concrete  interpretation  within  a  particular 
knowledge  domain.  This  interpretation  also  determines 
the  meaning  of  the  composed  relation  computed  by  the 
relational  product. 

Our  model  relation  M  to  be  used  in  the  sequel  M  £  7Z(Y  x 

0  x  P),  relates  the  following  three  lists  (sets)  of  entities: 

•  the  set  Y  of  process  attributes; 

•  the  set  O  of  observers,  assessors  or  measuring  sensors; 
with  the  set  G  of  process  identifiers; 

•  the  set  of  Parts  or  Components. 

Applying  the  usual  selection  and  projection  operators 
the  ternary  relation  is  decomposed  into  a  family  of  2-ary 
relations  in  TZ(Y  x  O),  indexed  by  the  set  P.  The  pro- 
cesses Pi  and  P2  are  extrusion  and  forging,  respectively. 
The  relation  Rl  and  R2  are  from  the  set  of  process  at- 
tributes Y  to  the  set  O)  is  composed  with  its  transpose 
RT  by  means  of  triangle  subproduci  and  the  local  preorder 
closure  computed  by  the  TRYSIS  system.  The  result  of 
this  computation  is  a  relation  from  the  process  attributes 
Y  to  y.  This  relation  shows  the  dependencies  of  process 
attributes  represented  as  a  preorder  relation. 

A  sample  result  displaying  dependences  and  equiva- 
lences of  process  attributes  is  shown  in  Figure  2.  These 
figures  show  the  Hasse  Diagram  (HD)  structures  display- 
ing the  preorders  of  process  attributes  computed  by  the 
fuzzy  relational  triangle  subproduct  over  processes.  The 
object  processed  in  both  processes  and  evaluated  by  the 
engineers  is  LPT  cover  plate  made  of  gamma-titanium. 

A  number  of  evaluative  schemes  can  be  formulated, 
showing  inter-process  dependences,  inter-observer  depen- 
dences, etc.  Because  the  purpose  of  this  paper  is  to  show 
basic  techniques  of  relational  analysis  as  applied  to  man- 
ufacturing processes,  not  to  provide  a  detailed  analysis  of 
engineering  of  LPT  cover  plate,  further  details  of  these 
computations  are  not  presented  in  this  paper. 

Three  scenarios  for  using  a  repertory  grid  on  LPT  parts 
have  been  used  [3].  Here  it  will  suffice  to  present  just  one 
scenario,  namely 

SCENARIO  A: 

1  object  (LPT  cover  plate)  is  rated  by  group  of  respondents 
(engineers)  where  each  of  the  respondents  does  assess  the 
object  independently  in  a  selected  process.  The  aim  is  to 
find  the  dependences  between  process  characteristics  and 
the  inter  respondent  consistency. 

There  may  be  several  situations  or  processes  in  which  the 
object  may  appear.  In  our  example  these  are  extrusion 
and  forging. 


4-2    Context  Dependency  of  the  Meaning  of  Semiotic  De- 
scriptors 

In  each  context  defined  by  a  specific  industrial  process 
(e.g.  forging  or  extrusion)  there  is  a  set  of  semiotic  descrip- 
tors that  is  relevant  to  the  knowledge  representation  struc- 
ture that  has  been  created  for  a  specific  purpose.  Thus  in 
our  project  [2],  we  study  the  question  of  affordability  that 
is  related  to  the  cost  of  production.  This  determines  which 
semiotic  descriptors  will  appear  in  the  SFKRS  relevant  to 
each  industrial  process. 

For  example,  looking  at  process  of  forging  and  extru- 
sion for  a  particular  part,  i.e.  LPT  cover  plate,  we  have 
large/small  process  window  bipolar  semiotic  descriptors  for 
extrusion  and  large/small  process  window  bipolar  semiotic 
descriptors  for  forging  (see  the  list  of  SD  in  Table  1).  Out 
of  these,  Large/small  process  window  are  identical  and  ap- 
pear in  both  processes. 

This  however  does  not  mean  that  a  specific  list  of  semi- 
otic descriptors  has  the  same  meaning  in  two  different  con- 
texts. A  distinct  difference  in  the  meaning  which  a  par- 
ticular list  of  semiotic  descriptors  can  acquire  in  different 
contexts  is  clearly  shown  in  Table  1. 

Fig.  2  shows  that  in  the  context  of  extrusion,  semi- 
otic descriptors  large  process  window(CI)  and  long  die 
hfe(C9)  are  captured  within  the  equivalence  class  of 
FRS,  hence  they  are  equivalent  in  their  meaning.  In 
the  context  of  forging  on  the  other  hand,  it  is  different. 
Large  process  window(C2)  is  equivalent  with  air  furnace 
atmosphere(Ch)  in  its  meaning.  The  equivalence  of  large 
process  window(C2)  and  long  die  life(C9),  however,  does 
not  hold  in  the  context  forging  despite  of  the  fact  that  it 
holds  for  the  process  of  extrusion. 

Let  us  look  at  these  results  now  in  semiotic  terms.  As 
can  be  seen  from  Table  1,  the  list  of  semiotic  descriptors 
used  for  evaluation  of  the  process  of  forging  is  the  same  as 
the  list  of  the  semiotic  descriptors  used  for  the  process  of 
extrusion.  So  in  terms  of  Fig.  1,  we  use  the  same  name 
space  for  both  processes.  The  Hasse  diagram  structure 
of  Fig.  2  for  computed  for  extrusion  is  different  from  the 
structure  that  was  obtained  for  forging.  In  terms  of  Fig. 
1,  The  Hasse  diagram  structures  represent  the  third  ver- 
tex (3)  of  the  semiotic  triangle,  i.e.  relational  conceptual 
characterisation.  This  characterisation  has  been  elicited 
experimentally  from  a  group  of  human  experts-engineers 
by  repertory  grids.  The  fuzzy  relation  oftained  from  these 
repertory  grids  was  then  tested  for  the  relational  property 
of  preorder  that  can  be  graphically  presented  as  a  Hasse 
diagram. 

We  can  also  consider  the  SFKRS  with  the  negative  side 
of  bi-polar  features  of  the  semiotic  descriptors.  As  shown 
in  Figures  3  and  4  the  SFKRS  can  also  be  formed  with  the 
negative  semiotic  descriptors.  SFKRS  depicted  in  Figure 
3  shows  the  property  of  contrapositive  symmetry2  while 

2  A  logic  proposition  is  contrapositive  if  a  — ►  b  =  -ib  — ►  ->a. 
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SFKRS  of  Figure  4  does  not  show  it. 

The  implication  operators  used  to  compute  the  Hasse 
diagrams  displayed  in  Figure  2  of  this  paper  are  indicated 
directly  in  the  figures3 

5  Conclusion 

In  this  paper,  we  have  shown  by  an  experimental  study, 
that  the  semiotic  triangle  which  is  usually  applied  indis- 
criminately, is  in  fact  context  dependent.  We  have  demon- 
strated that 

•  In  one  context,  two  different  fragments  of  SFKRS 
elicited  by  questioning  engineers  by  the  repertory  grid 
techniques  may  be  equivalent  in  meaning.  In  another 
context  their  equivalence  may  not  hold. 

•  In  two  different  contexts  defined  by  two  distinct  in- 
dustrial processes  the  meaning  of  a  set  of  the  identical 
semiotic  descriptors  may  be  different  in  their  meaning. 

Our  paper  also  shows  that  the  use  of  basic  semiotic  no- 
tions such  as  a  semiotic  triangle,  if  properly  applied  may 
add  another  dimension  to  enginnering.  It  is  widespread 
opinion,  in  particular  in  the  US  that  semiotic  is  useful 
only  for  literary  studies.  We  hope  that  we  have  managed 
to  persuade  our  readers  that  semiotics  enriched  by  fuzzy 
relational  techniques  has  important  applications  in  tech- 
nology. 
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FRS  of  Extrusion 


(S,H,harsh)=  (S,(H,HU,M),  mean) 

=  (S*,(H,HU),harsh)  =  (S*,(H,HU),mean) 

=  (G43,(H,HU),harsh)  =  (G43,  H,mean) 

=  (G43',(H,HU),harsh)  =  (C743',(H,HU),mean) 

=  (L,(H,HU),harsh)  =  (L,(H,HU),mean) 
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3These  are  defined  by  formulas  Lukasiewicz  implication  operator 
5 

a  — ►  b  =  mm(l,l  —  a  +  6);  Gaines-Goguen  implication  operator 

4 

a  — *  b  =  min(l,b/a)  and  S*  (Heyting-Godel)  implication  operator 
3 

a— 1-6=1  if  a  <  b,  b  otherwise. 
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FRS  of  Forging 


(S,H, harsh)  =  (S,H,mean) 

=  (S*,(H,HU,M),harsh)  =  (S*,(H,HU,M),mean) 
=  (G43,(H,HU),harsh)  =  (G43,(H,HU),mean) 
=  (G43',(H,HU), harsh)  =  (G43',(H,HU),mean) 
=  (L,(H,HU),  harsh)  =  (L,(H,HU),mean) 
=  (KDL,HU,harsh) 


C7  C9 


C3 


Figure  2.  SFKRS(<FRS,SD>)  with  the  posi- 
tive bipolar  fetures  in  the  context  of  Extrusion 
and  Forging 
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Positive  Semiotic  Descriptors 

Negative  Semiotic  Descriptors 

S  vtti  hnl 
U  V  in  uui 

IvT  pa  nin  c 

SvmHol 

u  v  ill  Ljyji 

IMpa  ti  i  n  CF 

IVlCCLXLlllg 

Pi 

v^apdijie  jTviicLiy  nc<ii  iviociciiiig 

\y  ± 

jjiiiiiieQ  /\noiy  iicai  ivioaeiing 

C2 

Tiarep  Prorpss  Window 

C2 

Small  Prorp«j«!  Window 

C3 

Low  Temperature 

C3 

High  Temperature 

C4 

Good  Lubricity 

Cl 

Low(or  Difficult)  Lubricity 

C5 

Air  Furnace  Atmosphere 

Cl 

Vacuum  Furnace  Atmosphere 

C6 

Good  Process  Control 

C6 

Limited  Process  Control 

C7 

Available  Tooling 

Cl 

New  Tooling 

C8 

Flat  Die  Shape 

C8 

Shaped  Die  Shape 

C9 

Long  Die  Life 

C9 

Short  Die  Life 

Table  1.  SD  both  in  the  context  of  Extrusion 
and  Forging 


FRS  in  Extrusion 

(S,H,harsh)=  (S,(H,HU,M),  mean) 

=  (S*,(H,HU),harsh)  =  (S*,(H,HU),mean) 

=  (G43,(H,HU),harsh)  =  (G43,  H,mean) 

=  (G43',(H,HU),harsh)  =  (G43',(H,HU),mean) 

=  (L,(H,HU),harsh)  =  (L,(H,HU),mean) 


Cl      C4  C6  C7  C3 


FRS  in  Forging 

(S,H,harsh)  =  (S,H,mean) 

=  (S*,(H,HU,M),harsh)  =  (S*,(H,HU,M),mean) 
=  (G43,(H,HU),harsh)  =  (G43,(H,HU,M),mean) 
=  (G43',(H,HU),harsh)  =  (G43',(H,HU),mean) 
=  (L,(H,HU),harsh)  =  (L,(H,HU),mean) 
=  (KDL,HU,harsh) 


Extrusion 

(G43,  M,  mean) 

(a)  C6 

K 

C4  CI 

l> 

(^C2,C7,C9j 

^\ 

C3       C5  C8 


Forging 

(G43,  M,  (harsh,mean)) 


Cl  C6 


Figure  3.  SFKRS(<FRS,SD>)  with  the  nega- 
tive bipolar  fetures  in  the  context  of  Extrusion 
and  Forging  -  holding  the  property  of  Contra- 
Positive  Symmetry 


Figure  4.  SFKRS(<FRS,SD>)  with  both  pos- 
itive and  negative  bipolar  fetures  in  the  con- 
text of  Extrusion  and  Forging  -  withouth  hold- 
ing the  property  of  ContraPositive  Symmetry 
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Abstract 

In  this  paper  an  integrated  system  is  developed 
which  can  perform  the  three  inference  methods 
introduced  by  Peirce  (1931-35)  (abduction, 
deduction  and  induction)  and  which  is  also  suitable 
for  case-based  reasoning.  The  hierarchical 
representation  of  knowledge  guarantees  an  optimal 
strategy  for  inference. 

1.  Introduction 

In  Backer  (1995)  and  Van  der  Lubbe  &  Backer 
(1995)  an  expert  system  approach  for  fuzzy  data 
analysis  was  presented.  By  means  of  learning  from 
samples  (i.e.  data  sets)  implicit  functional  relations 
are  transformed  into  explicit  knowledge  rules, 
including  the  uncertainty  associated  with  them.  On 
the  basis  of  the  knowledge  rules  which  are  ordered 
hierarchically  inference  is  made  for  new  samples. 
As  such  the  system  is  capable  for  both  inductive 
learning  and  deductive  reasoning.  In  addition  to  this 
also  case-based  reasoning  can  be  performed  using 
the  exceptional  samples  which  cannot  be 
generalized  adequately  to  general  knowledge  rules. 

In  the  present  paper  the  system  is  extended  to 
more  general  applications.  Furthermore,  the 
induction  part  is  enhanced  by  adapting  the  method 
proposed  by  Ho  et  al.  (1988).  Specific  attention  is 
paid  to  the  inference  mechanism,  where  a 
membership  function  should  be  assigned  to  the 
inferred  conclusion(s).  Situations  whereby  more 
than  one  knowledge  rule  will  fire  are  considered.  In 
order  to  obtain  a  system  which  can  deal  with  the 
three  semiotic  inference  types  distinguished  by  the 
American  philosopher  Charles  Sanders  Peirce 
(1931-35)  the  problem  of  abduction  in  hierarchical 
knowledge  trees  is  studied.  By  means  of  so-called 
pseudo-abduction  the  problem  of  abduction  can  be 
solved  in  a  deductive  manner. 


2:  The  reliable  training  set  and  inductive 
learning 

Let  us  assume  that  we  have  a  set  W  of  training 
samples  from  the  entire  instance  space  supplied  by 
the  expert.  A  member  of  the  training  set  W  is 
denoted  by  w.  It  corresponds  to  a  specific  sample. 
It  can  be  represented  by  an  n-dimensional  attribute 
vector  {a,(w),...,a](w),...,an(w)},  expressing  features 
of  that  sample.  The  attributes  can  be  quantitative  or 
qualitative.  It  is  assumed  that  all  qualitative 
attributes  do  possess  the  same  domain  of  possible 
modalities  { v,,....,vp},  like  very  low,  low,  medium, 
high  etc.  For  each  attribute  a,(w),  the  corresponding 
modality  is  denoted  by  v:(w),  whereby  Vj(w)  e 
(v,,...,vp).  And  thus 

w:  (a,(w)=v,(w)  an(w)=vn(w)}. 

To  each  sample  corresponds  a  decision  or 
judgement  with  respect  to  some  phenomenon.  The 
domain  of  modalities  of  a  decision  is  denoted  by  D 
=  {d,,...,dj,...dm).  The  learning  problem  now  is  to 
find  the  set  {R}  of  rules  that  predict  the  correct 
decision  or  domain  modality  for  each  training 
sample  if  its  attribute  values  given  by  the  expert  is 
taken  into  account.  Rules  have  the  form: 

IF  {a,(w)=v,(w),...,an(w)=vn(w)} 
THEN  {d=dj,  fj(w)}. 

This  can  be  read  as  follows.  If  a  sample  has 
attribute  values  according  to  the  conditional  part  of 
the  rule,  then  dj  is  a  correct  decision  with 
membership  function  f,(w).  Example:  IF  {al  =  high, 
a3  =  low},  THEN  {d2  =  very  low,  0.8}.  Remark 
that  not  all  attributes  are  per  se  part  of  the 
conditional  part  of  the  rule. 

Clearly,  from  the  expert  point  of  view,  the  set  D 

=    {d|,...,dj  dm}    can    be    associated    with  a 

partitioning  P  of  training  set  W  in  m  classes 
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{P,,...,Pj,...,Pm};  the  concept-driven  partitioning.  The 
rule  generation  problem  is  to  find  for  each  class  Pj 
a  representative  set  of  rules  {Rj}  =  {Rj,,...,Rjh,...} 
that  predicts  dj  for  all  the  samples  of  Pj  as  good  as 
possible.  There  are  two  requirements  on  the 
representative  set  of  rules:  the  requirements  of 
covering  and  discrimination.  The  first  says  that  for 
each  sample  there  should  be  a  rule  that  can 
recognize  this  sample.  Thus,  there  should  be  no 
samples  that  are  not  recognized.  The  discrimination 
property  implies  that  a  rule  which  is  aimed  at  a 
specific  decision  is  supposed  not  to  recognize 
samples  related  to  other  decisions.  Key  concept  is 
coverage;  the  coverage  of  a  rule  Rjh  is  the  number 
of  samples  of  Pj  which  is  recognized  by  this  rule. 
The  coverage  is  denoted  by  Pj(Rjh).  The 
membership  function  fj(w)  of  a  representative  rule 
Rjh  is  considered  to  be  equal  to  the  relative 
coverage  OJRjh)  of  that  rule,  expressed  by 

^(Rjh)  =  Pj(Rjh)/|Pj|. 

I.e.  the  number  of  samples  covered  by  the  rule 
relative  to  the  total  number  |Pj  of  training 
samples  in  the  partition. 

The  quality  of  representative  rule  Rjh  with  respect  to 
decision  dj  is  defined  as: 

Q(RJh)  =  P/Rj,,)/  Ej.  Pj-CRjh), 

which  takes  into  account  the  number  of  samples 
recognized  by  rule  Rjh,  which  are  not  an  element  of 

For  the  generation  of  representative  rules  the 
method  of  Ho  et  al.  (1988)  can  be  used.  One  starts 
with  the  determination  of  the  representative  rule  for 
say  sample  Wj  e  Pj.  Therefore,  one  begins  with  an 
'empty'  rule  and  ties  an  attribute  that  covers  the 
maximal  number  of  samples  of  Pj.  This  is  repeated 
by  adding  other  attributes  until  it  is  discriminative 
and  coverage  cannot  be  improved.  By  doing  so  for 
all  samples  of  PJ5  a  number  of  rules  will  be  the 
result.  The  first  rule  that  belongs  to  the 
representative  set  is  the  rule  with  the  largest 
coverage.  The  second  rule  is  the  rule  that  has  the 
largest  coverage  with  respect  to  the  samples  of  Pj 
which  were  not  covered  by  the  first  rule.  Etcetera. 
Rules  are  added  one  after  eachother,  until  all 
samples  of  Pj  are  covered. 

Experiments  were  performed  by  modifying  the 


algorithm  of  Ho  et  al.  (1988)  in  order  to  improve 
its  efficiency.  First  of  all  equally  general  and 
equally  parsimonious  rules  are  all  added  to  the  rule 
set.  I.e.  rules  that  cover  the  same  training  samples 
and  tie  the  same  number  of  attributes  in  their 
conditional  part  are  thought  to  be  equally 
acceptable  to  represent  a  concept  or  decision  as  no 
reasonable  criterion  exists  to  decide  which  one  is 
the  best. 

Furthermore,  on  rules  tying  more  than  one 
attribute  in  their  conditional  parts  an  extra 
parsimonious  check  is  carried  out.  This  is  necessary 
by  the  way  the  attributes  are  tied  to  the  rules  by  the 
original  algorithm.  In  this  only  the  coverage  of  each 
attribute  is  considered,  under  the  expectation  that 
this  will  give  the  best  chance  on  achieving  a 
general,  discriminative  rule  in  the  fastest  way.  By 
leaving  out  the  tied  attributes  one  by  one  from  the 
conditional  part  of  the  rule  and  checking  for 
disrimination,  redundant  attributes  can  be  detected 
and  excluded  from  the  rule  resulting  in  a  more 
parsimonious  (and  therefore  more  general)  rule. 
Sometimes,  as  in  the  case  of  equal  rules  as  outlined 
above,  this  will  even  cause  rules  to  become 
redundant. 

Nevertheless,  in  general  the  result  will  be  still 
poor,  since  the  quality  of  the  samples  will  differ.  In 
order  to  rule  out  these  effects,  a  data-driven 
partitioning  is  performed,  which  is  based  on  the 
application  of  some  numerical  fuzzy  clustering 
technique  leading  to  maximum  intra-cluster  simila- 
rity and  minimum  inter-cluster  similarity.  The  result 
of  the  data-driven  partitioning  is  a  hard  partitioning 
P'  into  clusters  with  for  each  sample  the 
membership  value  with  which  it  belongs  to  each 
cluster.  By  comparing  the  concept-driven  parti- 
tioning and  the  data-driven  partitioning  poor 
samples  or  exceptions  can  be  found. 


Figure  1.  Reliable  training  set  and  exceptions. 
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This  can  be  simply  done  by  ruling  out  those 
training  samples  that  are  not  included  in  the 
intersection  of  P  and  P'  and  those  samples  that 
have  no  membership  value  to  any  of  the  clusters 
above  a  certain  threshold.  These  samples  .  are 
considered  as  exceptions.  The  corresponding  set  is 
denoted  by  E.  The  remaining  samples  of  W  form 
together  the  reliable  training  set  W  =  W/E.  The 
corresponding  partitioning  is  P".  See  figure  1. 
The  reliable  training  set  W  with  its  partitioning  P" 
is  the  proper  training  set  on  the  basis  of  which 
knowledge  rules  are  determined.  This  is  performed 
according  to  the  modified  method  of  Ho  et  al. 
(1988),  mentioned  above.  Once  the  rules  are  found 
the  system  can  be  used  for  new  samples  for  which 
the  decision  should  be  inferred. 


new  representative  set  of  rules  is  generated. 


.w 


system 


rules/exceptions 


add  w'  to 
list 
exceptions 


expert's 


judgement 


updating 

no 

rule  set 

modification 

3:  Incremental  inductive  learning 

If  the  rules  are  determined  and  new  training 
samples  become  available  then  it  depends  on  the 
situation  whether  the  rule  base  should  be  modified 
or  not.  Assume  a  new  training  data  set/sample  w', 
for  which  a  decision  should  be  found.  This  sample 
is  characterized  by  its  attribute  values,  as  follows: 
w':  {a1(w')=v,(w'),...,an(w')=vn(w')}. 

The  corresponding  decision  according  to  the 
expert's  judgement  is  given  by 

J:  {d=dk,  fk(w')}. 

However,  the  system  predicts  the  concept  d=dk. 
instead  of  dk  with  its  corresponding  membership 
function: 

S:  {d=dk,  fk.(w')}. 

The  following  situations  are  distinguished.  Compare 
Figure  2. 

a)  If  dk  dk.,  then  sample  w'  should  be  considered 
as  an  exception,  since  system  and  expert  come  to 
different  decisions. 

b)  If  dk  =  dk.  and  D(S,J)  =  |fk(w')  -  fk  (w')|  <  0.5, 
then  the  difference  between  system  and  expert  is 
small,  and  thus  this  new  sample  is  covered  by  the 
existing  knowledge  base  and  only  the  membership 
function  of  the  rule  should  be  modified. 

c)  Modifications  of  the  rules  itselves  are  only 
needed  if  dk=dk.  and  D(S,J)  =  |fk(w')  -  fk(w')|  > 
0.5.  The  sample  w'  is  then  added  to  group  Pk  and  a 


Figure  2.  Rule  updating  and  identifying  exceptions. 

4:  Hierarchical  knowledge  organization  and 
deductive  reasoning 

Now  hierarchical  knowledge  organization  in  the 
form  of  rule  trees  is  considered.  We  assume  that  we 
have    been    able    to    construct    a    nesting  of 

partitionings,  P°,  P1,  P2,       The  null  partition  P° 

contains  all  training  samples.  At  level  1  we  have  a 
partitioning  P  =  {...,PJj,...},  where  Pj  is  one  of  the 
subsets  into  which  the  sample  space  is  divided.  The 
rule  set  which  covers  partitioning  P  is  denoted  by 
{R1}  =  {....Rj,...},  whereby  R\  is  the  set  of  rules 
related  to  subset  Pj: 

{Rj}  =  {Rj„  Rj*-.}. 

At  each  level  1  it  holds  that: 

-  V  i*j:  P1,  n  Pj  =  0. 

-  Ej  |P'j|  =  total  number  of  samples. 

-  If  ^  is  split  up  into  P1^  and  P+1r  then:  Pl+1,  e  P^. 
and  IP^J  <  IPJ,  and  similarly  for  P+V- 

It  can  be  shown,  considering  the  rules  for  the 
various  partitions,  that  for  increasing  1  the  rules 
become  more  specific.  Thus  at  the  top  of  the  tree 
are  the  most  general  (and  thus  less  discriminative) 
rules,  whereas  at  the  bottom  of  the  tree  most 
specific  rules  are.  At  the  lowest  level  (1  large)  we 
have  the  rules  corresponding  to  the  individual 
samples.  However,  the  tree  structure  has  also 
another  interesting  property. 


39 


training  sample 


(R3i) 
P3> 


Figure  3.  Partitioning  and  knowledge  tree. 


If  the  hierarchical  knowledge  tree  is  used  for 
inference  e.g.  on  sample  w': 

{a1(w')=v1(w'),...,an(w')=vn(w')}  (whereby  now  the 
decision  is  not  known  a  priori),  then  in  general  one 
has  the  following  situation.  Consider  rule  R'jh  at 
level  1  and  related  to  subset  P^: 

R'jh:  IF  {a1(w)=v1(w),..,an(w)=vn(w)} 
THEN  {d=d'J,  fyw)}. 

If  this  rule  is  applicable,  i.e.  the  attributes  (with 
their  modalities)  of  sample  w'  correspond  to  those 
of  the  conditional  part  of  the  rule,  the  inferred 
decision  will  be 

{d  =  d\,  fyw')}, 

whereby  fyw')  the  relative  coverage  Q(R'jh) 
(=f\(w))  of  the  rule. 

Since  every  partition  in  the  tree  is  in  general 
represented  by  a  rule  set  rather  than  by  one  single 
rule,  w'  may  be  covered  by  more  than  one  rule  in 
the  rule  set  and  therefore  determination  of  the 
membership  of  the  inferred  decision  is  not  trivial 
anymore.  Each  applicable  rule  contributes  to  the 
evidence  for  a  new  sample  w'  to  belong  to  the 
partition.  I.e.  the  plausibility  for  a  sample  to  belong 
to  the  partition  increases  if  more  rules  are  firing  at 
the  same  time. 

Then  simply  adding  all  the  relative  coverages  of 
the  applicable  rules  together  is  not  correct,  as  it 
does  not  take  into  account  possible  overlap  of  the 
coverages  of  the  rule.  Yet,  as  we  have  seen  before  a 


rule  will  only  be  added  to  the  rule  set: 

1 .  If  a  rule  covers  at  least  (worst  case) 
one  currently  uncovered  training 
sample,  or 

2.  If  rules  are  identical.  I.e.  they  cover 
the  same  training  samples  and  tie 
an  equal  number  of  attributes 
(equally  parsimonious). 

For  that  very  reason,  the  effective  relative  coverage 
is  a  more  appropriate  membership  function. 

Suppose  a  ruleset  { R'j }  for  partition  at  level  1, 
containing  n+m  rules  of  which  n  rules  {Rj„...,Rjn} 
are  applicable  to  a  new  sample  w'  and  the  other  m 
rules  {R1jJ)+1,...,R'jn+m}  are  non-applicable,  and  which 
are  ordered  for  decreasing  coverage  P"j(Rjh).  The 
effective  (relative)  coverage  Q(R\)  then  can  be 
calculated  by  the  following  two-step  algorithm: 

1-  acR'j)  =  p'/Rj.); 

for  i=2  to  n: 
if  P'/R'ji)  <  P'/R'j,.,)  or  pars(R'ji)  * 
pars(R'jM)    (i.e.    the    #    of  tied 
attributes  of  rule  R^  and  R'jM  are 
unequal), 

then  ^(R'p  =  ^(R'j)  +  1. 

2-  ^(R'j)  =  iP'jl  -  {P,j(R1j.n+,)  ••+ 
■•Pj(Rj.n+m)}- 

The  first  step  determines  a  worst  case  £>i(Rj) 
according  to  all  applicable  rules.  The  second  step 
takes  into  account  the  maximum  number  of 
trainingsamples  that  the  non-applicable  rules  may 


40 


cover.  Hence,  the  latter  will  result  in  a  second 
worst  case  Q9(R'j). 

Now  the  resulting  worst  case  0_(R';)  can  De  found 
as:  Q(R\)  =  maxtQ^R'j),  ^(R'j)]/ 1 | .  As  a  worst 
case  0_(R'j)  is  calculated,  the  membership  can  be 
thought  of  as  a  pessimistic  lower  bound. 

It  can  be  shown  that  for  increasing  1  the  range  of 
uncertainty  with  respect  to  the  inferred  decision  will 
decrease.  This  implies  that  in  practice  making 
inferences  it  is  preferable  when  rules  at  the  lower 
levels  of  the  knowledge  tree  can  fire,  since  these 
ones  give  the  smallest  range  of  uncertainty. 
However,  starting  at  the  lowest  level  leads  to  a 
computational  complexity  compatible  with  that  of 
usual  case-based  reasoning.  Starting  at  the  top  of 
the  tree  is  more  appropriate.  One  goes  down  to 
lower  levels  as  long  as  there  are  rules  that  fire.  If  at 
a  level  where  no  rule  fires  than  one  stops.  Although 
the  top-down  approach  is  better  than  the  bottom-up 
one,  a  more  adequate  approach  is  searched  for. 

Considering  the  hierachical  partitioning  tree,  not 
all  partitionings  will  be  equally  obvious.  There 
should  be  levels  of  which  the  partitioning 
corresponds  more  to  the  internal  test  set  structure 
than  other  levels.  It  may  be  expected  that  the  rules 
of  the  rule  sets  related  to  these  levels  have  a  higher 
chance  of  firing  than  rules  of  other  levels,  since 
these  rules  set  are  related  to  a  more  natural 
underlying  partitioning.  Having  found  such  levels 
this  can  be  used  as  starting  point  for  making 
efficient  inferences  in  the  hierarchical  knowledge 
tree. 

In  fact  the  problem  is  to  find  the  most  probable 
partitioning  as  well  as  the  'true'  number  of  clusters. 
Based  on  the  idea  of  core  zones  as  discussed  in 
Bezdek  et  al.  (1988),  Jennes  et  al.  (1994) 
introduced  the  core  zone  index  (CZI)  for  estimating 
this  true  number  of  clusters.  CZI  is  computed  for 
each  level  in  the  partitioning  tree.  That  level  with 
the  largest  CZI  is  called  the  core  zone  index  level 
(see  also  Van  der  Lubbe,  1995).  It  is  the  starting 
point  for  the  inference  process  in  the  hierarchical 
knowledge  tree.  Let  the  core  zone  index  level  be 
indexed  by  1,  then  in  search  for  the  most  specific 
rule  the  successive  levels  1+1,  1+2  etcetera  are 
considered  until  the  level  where  no  rule  can  fire. 
The  last  but  one  level  is  the  level  which  generates 
the  ultimate  decision(s).  If  more  than  one  rule  fires 
at  that  level,  i.e.  in  the  same  node,  the  membership 


function  with  respect  to  the  ultimate  conclusion  is 
according  to  the  two-step  algorithm  mentioned 
above.  If  the  lower  levels  do  not  succeed,  then  the 
higher  levels  1-1,  1-2  etcetera  are  considered.  If  only 
at  a  high  level  there  are  rules  can  fire  with  respect 
to  inference  on  a  sample  w',  the  conclusion  should 
be  that  the  knowledge  in  the  tree  is  not  applicable 
to  the  inference  for  that  specific  sample  w',  for  the 
core  zone  index  level  is  the  most  plausible  level. 
The  fact  that  only  rules  at  higher  levels  succeed, 
implies  that  a  decision  about  a  sample  w'  can  only 
be  made  if  the  optimal  partitioning  is  neglected. 
Then  sample  w'  should  be  considered  as  an 
exception. 

In  practice  starting  at  the  core  zone  index  level 
will  lead  to  a  faster  solution  than  starting  at  the 
bottom  (or  top). 

5.  Abductive  reasoning 

Whereas  the  methods  described  above  are  dealing 
with  induction  and  deduction,  it  is  of  interest  to 
study  the  possibilities  for  abductive  inference,  since 
in  practice  abduction  is  the  most  important  of  the 
three  Peircean  inference  forms.  In  literature  many 
methods  for  abduction  are  given.  Compare  Van  der 
Lubbe  (1996).  These  methods  are  not  applicable 
here,  taking  into  account  the  structure  of  the  system 
developped  until  now.  Considering  abduction  as  the 
inverse  of  deduction,  methods  for  abductive 
inference  should  be  compatible  with  the  methods 
followed  for  deduction.  That  means  that  also  in  the 
case  of  abduction  a  similar  hierarchical  knowledge 
structure  should  be  chosen  as  the  basis.  Clearly,  the 
hierarchical  knowledge  tree  used  in  the  foregoing 
sections  can  not  be  applied  for  abduction,  since  the 
conditional  part  of  the  rule  is  related  to  the 
attributes  and  the  concluding  part  to  the  decision  or 
conclusion.  In  the  case  of  abduction  we  are  just 
reasoning  from  decisions  to  attributes. 

Let  us  assume  that  we  have  k  decision  domains 
{D,,...,Dk}  instead  of  just  one  decision  domain.  In 
the  case  of  deductive  reasoning  we  now  have  k 
knowledge  trees.  For  abductive  reasoning  there  are 
n  knowledge  trees,  if  there  are  n  attributes.  The  n 
abductive  knowledge  trees  are  generated  in  the 
same  way  as  the  deductive  knowledge  trees  are.  I.e. 
by  means  of  the  modified  method  of  Ho  et  al. 
(1988),  taking  into  account  possible  exceptions. 
Compare  Figure  4. 

The  final  abductive  result  is  achieved  by 
deductive  inference  in  the  n  abductive  knowledge 
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given: 


W  =  {a,(w')  =  v,(w')  a^w')  =  v„(w')} 

w'  w' 

I  »1 


system  inference:  S(w')  =  {D,  =  u^w')  [\  =  um(w')} 


system  inference:  w'  ={(ai(w')  =  V|(w'),  ...  ,  a„(w')  =  v„(w')} 


Figure  4.  Deductive  and  abductive  knowledge  trees. 


trees.  Therefore,  it  can  be  considered  as  a  form  of 
pseudo-abduction.  For  each  of  the  attributes  an 
indication  is  given  of  its  possible  modalities  on  the 
basis  of  the  given  decisions  or  conclusions. 

Experiments  were  performed  by  first  reasoning 
on  the  basis  of  the  deductive  knowledge  trees  - 
from  attributes  to  decisions  -  followed  by  pseudo- 
abductive  reasoning  on  the  basis  of  the  abductive 
knowledge  trees:  from  decisions  to  attributes.  In  all 
cases  the  original  attributes  with  their  modalities 
were  refound.  This  underlines  the  usefullness  of 
pseudo-abduction . 

Considering  the  overall  system  we  may  conclude 
that  instead  of  comparing  the  unknown  sample  with 
all  other  samples  of  the  training  set,  as  is  the  case 
in  usual  case-based  reasoning,  the  hierarchical 
knowledge  organization  enables  a  more  efficient 
inference;  not  in  the  least  by  determination  of  the 
core  zone  index  level.  By  means  of  the  hierarchical 


structure  there  can  be  inferred  to  the  most  probable 
decision  step-by-step.  If  during  the  process 
inference  stops  half-way  by  not  further  firing  of 
rules  at  the  lower  levels,  more  general  decisions 

will  result  than  at  the  bottom  of  the  tree.  If  the 
knowledge  tree  is  not  applicable  at  all  to  the  sample 
to  be  judged,  then  it  is  considered  as  an  exception. 
Then,  the  only  thing  one  can  do  is  using  case-based 
reasoning  and  comparing  it  with  earlier  exceptions 
in  the  hope  that  then  some  inference  becomes 
possible. 

6.  Conclusion 

It  has  been  shown  that  on  the  basis  of  a  training 
set  an  adequate  hierarchical  knowledge  tree  can  be 
generated  by  matching  concept-  and  data-driven 
partitionings  of  the  sample  space  and  by  searching 
for  representative  rules  by  means  of  induction. 
These  hierarchical  knowledge  trees  are  appropriate 
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for  incremental  learning  and  for  handling 
exceptions.  Further,  it  enables  inference  for  new 
samples,  by  means  of  deductive  as  well  as 
abductive  reasoning,  in  an  efficient  way.  This 
efficiency  is  due  to  the  fact  that  in  general  not  all 
samples  of  the  training  set  should  be  considered. 
The  core  zone  index  and  the  determination  of  the 
core  zone  index  level  guarantee  the  optimal  start 
level.  Usual  case-based  reasoning  is  only  applied  to 
exceptions. 
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Abstract 

This  paper  introduces  an  approach  to  knowledge  repre- 
sentation as  an  active  process.  More  than  merely  record- 
ing knowledge  in  some  extractable  form,  we  recognize  that 
representation  must  include  the  processes  that  perform  the 
activities  of  insertion  as  well  as  extraction.  Our  intention 
is  to  develop  a  representational  method  that  is  powerful 
enough  and  flexible  enough  that  systems  built  using  it  can 
represent,  reason  about,  and  change  their  own  structure 
and  behavior. 

The  approach  is  based  on  recent  work  on  knowledge  repre- 
sentation for  our  "wrapping"  integration  infrastructures, 
and  on  some  comparative  semiotics:  studies  of  mathemat- 
ical and  linguistic  reasoning. 

The  basic  notion  is  a  generalization  of  set  theory  in  four  di- 
rections: our  collective  objects  have  (1)  indefinite  bound- 
aries and  (2)  indefinite  elements;  (3)  the  context  is  allowed 
to  leak  into  the  interpretation  of  the  objects;  and  (4)  there 
is  a  notion  of  multiplicity  of  structures  that  correspond 
respectively  to  considering  the  same  object  or  class  from 
different  points  of  view,  in  different  contexts.  This  notion 
allows  us  to  model  the  modeling  decisions  explicitly,  and 
to  keep  track  of  the  modeling  simplifications  so  we  can  re- 
late them  to  each  other  and  to  the  processes  that  create, 
change  and  use  them. 

1  Introduction 

The  main  goal  of  this  work  is  to  invent  a  more  interest- 
ing and  flexible  mechanism  for  representing  the  knowledge 
and  information  needed  by,  in,  and  for  constructed  com- 
plex systems  in  general,  and  computing  systems  based 
on  wrappings  in  particular.  We  have  written  elsewhere 
[5]  about  the  barrier  imposed  by  any  well-founded  sys- 
tem, and  our  problem  is  to  try  to  build  something  that 


does  not  have  this  same  barrier  (i.e.,  indefinitely  refinable 
knowledge  structures)  from  something  that  is  inherently 
grounded  in  computer  hardware  (i.e.,  computer  software). 
The  genesis  of  this  work  was  thinking  about  the  differences 
between  mathematical  and  linguistic  reasoning  [7]. 

In  this  section,  we  describe  some  of  the  background  for 
our  attitude,  and  introduce  some  of  the  concepts  and  ter- 
minology we  use. 

1.1  Background 

In  our  research,  we  have  considered  two  different  map- 
ping activities:  modeling  some  phenomenon  in  the  real 
world  (using  simulation  and  modeling  techniques  we  have 
developed  [8]),  and  implementing  some  abstract  idea  (us- 
ing software  engineering  and  software  system  development 
techniques  that  we  have  developed  [5]).  In  both  cases, 
there  is  a  representation  issue,  and  there  are  simplifica- 
tions needed  to  make  the  representation.  One  of  our  goals 
is  to  allow  explicit  treatment  of  the  modeling  process  and 
the  representation  issues. 

We  also  want  to  get  away  from  the  use  of  sets  as  the  only 
model  for  categories  of  knowledge,  since  they  artificially 
limit  the  kinds  of  categories  that  can  be  considered  [4]. 
Sets  have  both  definite  boundaries  and  specific  elemen- 
tary units.  There  are  many  models  of  "uncertainty"  that 
generalize  the  first  constraint:  probability  distributions, 
fuzzy  sets,  belief  functions,  rough  sets,  and  many  others. 
As  near  as  we  can  tell,  there  is  no  appropriate  model  that 
generalizes  the  second  constraint.  This  lack  is  mainly  due 
to  the  nature  of  mathematics  as  our  most  concrete  form 
of  reasoning:  the  elementary  units  must  be  defined  before 
we  can  start  mathematical  reasoning.  This  constraint  is 
not  present  in  linguistic  reasoning  [7]. 

In  addition,  we  want  to  generalize  sets  in  two  other  di- 
rections:  we  want  to  allow  context  to  leak  into  the  de- 
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scriptions  [1]  [10],  and  we  want  to  allow  individuals  to 
have  different  structures  in  different  viewpoints.  In  many 
cases,  the  indefinite  boundaries  of  a  category  are  made 
more  precise  (though  still  somewhat  indefinite)  by  speci- 
fying the  context  under  which  the  category  is  being  con- 
sidered, thereby  allowing  the  context  to  become  part  of 
the  specification  of  the  category.  For  this  reason,  we  make 
context  a  part  of  viewpoint.  We  also  note  that  the  differ- 
ent viewpoints  can  give  the  category  different  structure, 
one  of  which  can  even  be  to  take  the  category  as  an  indi- 
vidual. 


1.2    A  First  Look 

In  this  subsection,  we  introduce  the  concepts  that  define 
our  approach.  Do  not  expect  formal  or  precise  definitions 
here;  they  begin  in  the  next  section.  Terms  in  italics  in 
this  section  are  technical  terms  in  our  approach  that  are 
defined  in  the  next  sections. 

The  most  important  source  idea  is  the  "thing"  ness  of  all 
concepts:  anything  we  can  describe  with  language  can  be 
"reified" ,  that  is,  viewed  as  a  "thing" ,  and  referred  to  as  a 
unit  or  viewed  in  a  number  of  different,  not  wholly  compat- 
ible, ways.  That  is,  each  viewpoint  leads  to  a  possibly  dif- 
ferent structure  for  the  individual  (and  certainly  to  a  dif- 
ferent interpretation  of  the  significance  of  the  structure). 
These  "thing"s  can  be  grouped  naturally  into  categories. 
We  think  of  categories  as  including  our  notion  of  concepts 
or  classes.  We  call  these  "thing"s  individuals,  and  think 
of  the  categories  as  collections  of  them.  It  is  this  prop- 
erty that  leads  us  to  consider  all  concepts  as  individuals, 
at  least  under  certain  viewpoints,  and  to  recognize  that 
different  viewpoints  lead  to  different  structures  for  the  in- 
dividuals. We  can  view  a  category  as  an  individual  under 
certain  viewpoints,  and  we  can  view  an  individual  as  an 
assemblage  under  certain  viewpoints.  We  can  also  view  an 
individual  under  a  certain  viewpoint  to  be  a  member  of  a 
category  under  some  viewpoint. 

There  are  two  ways  to  study  grouping  of  individuals  into 
categories.  We  can  begin  with  some  individuals  and  try 
to  identify  appropriate  categories,  or  we  can  begin  with 
categories  and  try  to  examine  individuals  in  them.  Both 
approaches  are  used  here. 

Categories  also  describe  relationships  among  other  cate- 
gories. We  want  to  generalize  the  notion  that  categories 
contain  other  categories,  which  are  separated  by  their  dis- 
tinctions into  different  roles.  A  relationship  has  roles  that 


may  be  filled  by  other  categories,  so  we  can  talk  about 
roles  without  using  the  categories  that  might  fill  them.  In 
this  sense,  they  are  similar  to  frames  with  variable  slots 
for  other  objects. 

A  category  can  be  divided  partially  by  distinctions  into 
several  other  categories,  and  the  distinctions  become  new 
roles  for  the  division  (partially  means  not  necessarily  ei- 
ther disjoint  or  exhaustive,  i.e.,  not  only  into  subsets). 
The  "other"  categories  are  called  constituents.  Distinc- 
tions are  important  only  relative  to  a  certain  viewpoint,  so 
division  of  a  category  is  defined  only  in  terms  of  a  view- 
point. Different  viewpoints  may  divide  the  category  in 
different  ways.  The  divisions  include  our  notions  of  knowl- 
edge structures  (including  abstract  datatypes  such  as  "set 
of  X"  and  other  structures). 

Constituent  categories  for  different  divisions  of  a  category 
generally  have  different  viewpoints  from  each  other.  When 
we  write  that  a  category  "contains"  another,  it  means  that 
there  is  a  viewpoint  according  to  which  the  second  cate- 
gory is  one  of  the  constituents  of  the  first  one. 

If  we  extend  this  decomposition  further  to  many  levels, 
we  get  a  kind  of  hierarchy  of  categories,  related  by  various 
divisions  according  to  various  viewpoints.  This  hierarchy 
becomes  a  kind  of  "ontology"  for  the  application  domains 
involved,  and  it  allows  us  to  make  much  more  flexible  use 
of  the  usual  sorts  of  ontologies  defined  in  knowledge  rep- 
resentation languages. 

1.3    Systems  and  Environments 

A  knowledge  representation  scheme  makes  no  sense  out- 
side a  context  of  use.  The  context  we  consider  for  this 
paper  is  autonomous  systems  [2]  [3]  [9].  An  autonomous 
system  must  somehow  classify  its  surroundings  in  ways 
that  allow  appropriate  responses,  or  have  some  way  to 
recognize  features  of  its  surroundings  that  instigate  ap- 
propriate responses.  An  autonomous  system  also  needs  to 
learn  from  its  experience  in  the  world,  adjusting  its  in- 
ternal "knowledge"  structures  to  improve  its  knowledge  of 
the  world,  and  therefore  improve  the  effectiveness  of  its 
behavior. 

This  kind  of  system  has  internal  structures  that  include 
interpreter  definitions  in  addition  to  the  usual  data  and 
explicit  knowledge  base  structures,  because  the  architec- 
ture of  our  systems  (with  their  active  integration  activities 
coordinated  by  wrapping  processes)  means  that  the  inter- 
preters are  also  defined  by  explicit  data  structures  (e.g., 
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wrex  programs).  The  system  also  has  four  processes  for 
knowledge  construction  and  use:  "recording"  experiences 
(as  traces)  in  processible  structures;  assessing  it  according 
to  different  criteria;  extracting  knowledge  (as  models)  in 
processible  structures;  and  using  it  for  different  purposes. 

2    Organization  Rules:  Structures 

We  can  organize  knowledge  either  "top-down",  from  col- 
lectives to  instances,  or  "bottom-up",  from  instances  to 
collectives.  Our  top-down  approach  to  modeling  knowl- 
edge depends  on  one  notion  of  organizational  entity  and 
one  notion  of  organizational  relation.  The  "categories" 
refer  to  classes  of  real  world  phenomena,  and  the  "divi- 
sions" represent  them.  Our  bottom-up  approach  depends 
on  these  notions  as  if  we  were  collecting  "individuals"  into 
classes. 

2.1  Category 

Categories  are  descriptions  of  all  phenomena.  The  funda- 
mental organizational  entity  is  the  category,  which  repre- 
sents anything  that  can  be  labeled  by  a  symbol.  In  fact, 
we  want  our  notion  of  category  to  be  co-extensive  with  the 
notion  of  "concept"  and  the  notion  of  something  that  we 
can  describe  and  reason  about.  That  makes  it  the  same 
notion  as  the  one  we  have  considered  before  [7]  for  "sym- 
bols" :  we  can  make  up  and  use  names  for  categories  long 
before  we  know  everything  (or  anything!)  that  is  in  the 
categories  (e.g.,  black  holes,  phlogiston,  public-key  cryp- 
tosystems).  This  is  the  connection  between  our  notion  and 
natural  language  processes:  we  are  attempting  to  model 
the  effectiveness  of  language  in  referring  to  concepts,  even 
though  we  expect  there  to  be  many  more  categories  than 
words  or  phrases  [6]. 

2.2  Division 

The  fundamental  organizational  relation  is  division 
of  a  category 

into  roles  for  other  categories 
according  to  a  viewpoint. 

This  division  is  our  replacement  notion  for  the  usual  no- 
tion of  set  or  structure.  Our  notion  of  division  corresponds 
to  structural  decomposition  of  categories,  but  it  is  much 


more  general;  it  allows  all  structuring  methods  to  coexist. 
We  describe  what  a  viewpoint  is  a  little  later  on. 

Divisions  make  some  things  about  a  category  more  def- 
inite. We  make  divisions  because  the  more  information 
we  have  about  a  category,  the  more  precisely  we  can  use 
it.  We  make  multiple  divisions  because  we  think  of  cate- 
gories as  being  different  in  different  contexts,  or  when  we 
consider  them  from  different  viewpoints. 

One  kind  of  division  is  to  specify  that  the  category  plays  a 
particular  role  in  some  other  division.  This  features  allows 
us  to  build  up  as  well  as  down  with  the  categories. 

Another  kind  of  division  shows  the  structure  of  the  cate- 
gory when  considered  from  one  viewpoint,  with  decompo- 
sition corresponding  to  subcategories  of  the  category,  and 
constraints  corresponding  to  distinguishing  properties  of 
the  subcategories. 

We  can  also  describe  the  structure  of  the  individuals  in  a 
category,  at  least  according  to  a  particular  viewpoint,  by 
first  defining  a  division  that  corresponds  to  considering  the 
category  as  a  collection  of  individuals  and  thereby  defining 
a  category  of  those  individuals.  Then  we  can  either  use  a 
division  of  that  category  for  the  individual  structure,  using 
architectures  and  roles  for  other  categories,  with  "boxes 
and  lines"  diagrams  or  hierarchies  for  structure,  or  we  can 
imagine  an  abstract  space  containing  (representations  of) 
the  individuals  as  elements  of  the  space,  or  as  a  subregion 
in  which  they  must  lie,  or  as  a  mapping  from  some  other 
structure  into  the  space. 

We  use  a  viewpoint  to  emphasize  certain  distinctions 
among  entities  in  a  category.  Distinctions  can  be  dif- 
ferences in  content,  structure,  available  information,  or 
intended  use.  They  are  often  explicit,  but  sometimes  un- 
known at  first. 

Distinctions  define  roles  in  the  division  of  a  category.  For 
example,  a  partition  of  a  set  is  a  division  that  has  roles  for 
the  various  subsets,  which  may  or  may  not  be  explicitly 
distinguished.  The  different  components  of  a  structured 
entity  are  roles  in  the  division  that  defines  that  decom- 
position. The  different  steps  of  a  structured  process  are 
roles  in  the  division  of  that  process  into  parts.  Some  other 
common  ways  to  write  about  divisions  are  that  a  category 
"contains"  entities,  or  that  a  category  "describes  a  rela- 
tionship" among  entities. 
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2.3  Individuals 

Individuals  are  particular  objects  or  agents  or  actions  or 
events  or  relationships,  usually  associated  with  one  or 
more  categories.  The  individuals  are  the  "things"  we  re- 
ferred to  in  our  introduction.  An  individual  is  intended 
to  represent  a  real-world  entity,  the  same  way  a  category 
does.  In  many  cases,  what  we  would  normally  consider  to 
be  an  individual  is  not  uniquely  specified,  so  we  must  make 
do  with  a  class  of  possibilities.  In  this  case,  we  are  using 
the  individual  as  a  "typical"  member  of  a  category  (which 
derives  from  a  particular  viewpoint  on  the  individual). 

We  talk  about  individuals  as  members  of  a  category,  even 
though  membership  requires  a  viewpoint,  and  therefore  is 
really  only  for  divisions.  That  means  that  an  individual 
must  be  a  member  of  a  division,  not  a  category.  Every 
individual  is  therefore  an  instance  of  one  or  more  divisions, 
and  we  consider  the  corresponding  categories  as  "types" 
of  the  individual  (according  to  the  viewpoint  that  leads  to 
the  division). 

Sometimes  we  are  given  an  individual  (including  the  cur- 
rent situation),  and  we  try  to  form  or  understand  the  cat- 
egories it  may  lie  in.  We  may  have  to  infer  one  or  more 
viewpoints,  and  consider  different  structures  for  the  indi- 
vidual. 

As  we  get  deeper  in  the  category  hierarchy,  the  categories 
become  more  and  more  concrete,  and  we  eventually  get  to 
sets  and  data  structures,  which  in  our  scheme  are  divisions 
of  categories. 

Divisions  also  apply  to  individuals,  but  we  usually  try  to 
put  the  individual  into  a  category  first,  so  that  the  divi- 
sions apply  to  the  category  instead. 

2.4  Viewpoint 

We  use  viewpoints  to  distinguish  different  structures  for 
our  categories,  either  for  the  individuals  in  the  category 
or  for  the  category  itself.  The  viewpoint  defines  some  of 
the  assumptions  and  modeling  decisions  that  are  reflected 
in  the  presented  division. 

A  viewpoint  is  a  focus  within  a  context.  A  focus  is  a  pre- 
sumed scope  and  level  of  detail,  which  defines  the  scope  of 
quantifiers,  and  the  level  of  detail  required  to  express  the 
viewpoint.  A  different  focus  means  a  different  viewpoint, 
which  means  a  different  division.  Different  viewpoints  of 
a  category  often  differ  only  in  their  focus.  The  scope  de- 


fines the  extent  of  the  conceptual  units,  both  structures 
and  processes,  and  the  level  of  detail  defines  their  concep- 
tual "size".  It  is  part  of  our  ongoing  research  to  develop 
notations  that  express  or  define  scope  and  level  of  detail. 
There  is  something  "compatible"  about  the  "size"  of  the 
conceptual  units  in  a  focus  that  makes  it  seem  "natural". 

It  is  important  in  this  approach  that  the  same  category 
may  have  many  different  divisions,  each  with  a  different 
viewpoint,  and  that  we  can  therefore  easily  imagine  adding 
new  divisions  as  other  viewpoints  become  interesting. 

A  viewpoint  implies  a  choice  of  descriptive  language  for 
the  corresponding  division,  and  a  corresponding  language 
interpreter.  The  viewpoint  is  outside  the  division,  but 
refers  to  it. 

One  of  the  difficult  outstanding  research  problems  is  to 
determine  how  and  when  viewpoints  are  related  to  each 
other. 

2.5  Context 

We  have  long  maintained  that  context  provides  all  the 
interpreters  for  symbols,  and  that  semantics  Js_  the  inter- 
pretation of  syntax.  For  this  paper,  it  means  that  context 
provides  all  the  interpreters  for  categories,  and  that  se- 
mantics is  the  interpretation  of  divisions  in  categories. 

A  context  does  not  _do_  anything.  It  is  a  passive  provider 
of  and  reference  to  information  and  processes. 

An  interpreter  is  a  mapping  from  elements  to  actions  or 
objects,  and  from  constructs  to  patterns  of  action  or  struc- 
tures. So  just  as  focus  is  a  collection  of  categories  that  are 
used  as  units,  context  is  a  collection  of  categories  that  are 
used  as  relationships  (mappings)  from  some  categories  to 
others. 

There  is  a  notion  of  specificity  in  context.  There  is  a  rela- 
tion of  "more  concrete"  among  divisions,  and  interpreta- 
tion is  intended  to  map  towards  more  concrete  categories. 
The  simplest  implementation  of  context  is  a  collection  of 
mappings  from  attributes  to  values,  but  others  are  possi- 
ble. 

A  context  is  a  state  of  knowledge,  which  is  a  collection 
of  assumptions,  specializations,  relationships,  and  current 
situation. 

A  context  contains  or  refers  to  the  domain-specific  prior 
information  structures  and  interpreters.  The  environment 
is  the  situation-specific  current  dynamic  activity  to  which 
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the  system  must  respond. 

It  is  never  the  case  that  we  know  all  of  the  context  of  a 
division,  so  it  is  essential  that  we  be  able  to  augment  the 
context  at  any  time.  When  we  write  of  a  category  "in" 
a  context,  we  actually  mean  the  division  of  the  category 
according  to  the  viewpoint  given  by  the  context  and  the 
focus  of  the  category. 

3    Process  Rules:  Behaviors 

Our  knowledge  representation  defines  more  than  just  what 
the  concepts  are  and  how  they  are  related.  It  also  defines 
how  things  change  and  what  can  happen. 

3.1    Activity  and  Action 

Everything  that  happens  is  an  activity.  Every  activity  has 
an  effect,  which  we  may  or  may  not  be  able  to  detect. 
All  activities  take  time,  which  may  or  may  not  be  repre- 
sentable  as  instantaneous,  according  to  the  time  scale  in 
use.  An  activity  description  includes  all  intermediate  steps 
or  stages  (we  do  not  presume  discrete  steps).  An  activity 
is  a  kind  of  category.  An  action  is  a  kind  of  activity.  An 
activity  can  be  made  up  of  actions. 

An  action  description  is  an  indivisible  activity.  It  is  a 
kind  of  division  that  has  the  form  of  a  collection  of  con- 
straints among  prior  and  posterior  variables  (i.e.,  roles  for 
individuals),  and  their  times  of  occurrence.  There  is  no  in- 
termediate "step"  in  an  action;  the  values  of  the  affected 
variables  can  only  be  queried  before  the  action  or  after 
the  action.  During  the  action,  the  values  are  presumed 
to  be  the  same  as  at  the  starts  of  the  action,  but  they 
are  also  considered  to  be  "in  transit",  so  they  cannot  be 
relied  on  to  hold  fixed  for  any  positive  amount  of  time. 
An  activity  description  allows  querying  of  actions  at  all 
intermediate  times.  For  example,  a  discrete  sequence  of 
steps  implements  an  activity  as  a  set  of  actions,  whereas 
a  continuous  motion  is  an  activity  for  which  we  know  all 
the  intermediate  places  as  a  function  of  time. 

A  division  is  a  model  in  our  usual  sense,  with  component 
roles  for  other  models  and  interactions  styles  which  distin- 
guish different  classes  of  divisions.  Actions  on  a  division 
are  converted  into  actions  on  the  roles,  the  interactions  or 
both.  This  conversion  is  part  of  the  definition  of  a  divi- 
sion. Different  kinds  of  divisions  correspond  to  different 
collections  of  conversion  processes. 


Every  action  is  a  change  of  something.  Action  classes  have 
pre-  and  post-  conditions,  and  descriptions  of  effects. 

Every  action  has  an  effect,  possibly  an  actor  (which  can  be 
an  agency  instead  of  an  individual),  and  an  actee,  which 
is  usually  one  or  more  roles  for  categories  (or  individuals). 

The  difference  between  an  event  and  an  action  is  that 
events  are  instantaneous;  we  don't  use  them.  Actions  can 
take  some  amount  of  time. 

3.2  Situations 

Situations  are  particular  parts  of  the  environment  (as  if 
they  were  a  component  or  aspect  of  the  environment,  ex- 
cept that  we  don't  necessarily  know  anything  else  about 
the  environment).  A  situation  describes  part  of  the  envi- 
ronment at  a  particular  time,  and  in  that  sense  is  like  a 
context  individual;  the  difference  between  the  situations 
and  other  parts  of  context  is  that  situations  are  instances 
of  the  environment  in  time. 

There  is  some  interesting  new  mathematics  associated 
with  situations  [1],  and  much  work  on  situated  planning 
[10]  and  situated  behavior  [9],  and  we  believe  that  our  ap- 
proach allows  us  to  describe  the  situations,  so  that  they 
may  be  made  explicit  in  the  computations. 

Computationally,  the  application  of  an  action  to  a  situ- 
ation has  three  parts:  the  situation  provides  individuals, 
the  action  roles  match  up  to  those  individuals,  and  the 
action  effects  change  those  individuals. 

For  these  purposes,  a  situation  can  be  thought  of  as  a 
time  and  a  collection  of  individuals.  It  is  not  a  set.  It  is  a 
partly  determined  category.  Such  a  category  is  determined 
by  coincidence  in  time  and  space  rather  than  coincidence 
in  defining  properties. 

3.3  Current  Behaviors 

We  distinguish  the  rules  of  behavior,  which  are  about 
classes  of  situations  and  the  appropriate  corresponding 
activities,  from  the  actual  behavior,  which  is  about  the 
current  situation  and  the  current  activities  as  time  pro- 
gresses. Classes  of  possible  behaviors  can  be  defined  by 
stringing  together  the  rules  according  to  classes  of  possi- 
ble situations,  but  they  must  be  specialized  to  represent 
the  actual  (or  presumed)  situation. 

What  actually  happens  in  a  system  (or,  more  properly, 
within  a  system  _and_  between  it  and  its  environment)  is 
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a  temporal  pattern  of  instances  of  activities.  We  call  them 
"activities"  instead  of  "events"  because  the  latter  term  is 
used  more  often  for  instantaneous  events,  which  we  allow 
only  as  a  conceptual  shorthand. 

While  the  system  is  operating,  the  relevant  part  of  the 
"current"  situation  at  any  given  time,  or  even  the  cur- 
rently available  knowledge  about  the  current  situation  at 
that  time,  is  almost  never  enough  to  identify  it  uniquely. 
There  is  therefore  always  a  class  of  possible  situations. 
The  class  is  organized  not  so  much  by  its  internal  struc- 
ture as  it  is  by  its  component's  coexistence  in  time. 

4  Conclusions 

The  notions  we  have  discussed  here  provide  a  different 
approach  to  representing  knowledge  in  a  complex  system. 
We  believe  that  they  allow  incorporation  of  most  other 
methods  of  representation,  and  that  something  along  these 
lines  is  necessary  to  allow  the  representation  to  take  the 
current  context  and  environment  into  account. 

We  have  provided  a  number  of  ideas  for  what  a  new  repre- 
sentational approach  would  have  to  provide.  Specifically, 
we  have  stressed  how  "representing"  must  be  changed  from 
static  notions  of  the  organization  and  definitions  (data 
dictionary  style)  to  one  that  must  include  the  active  pro- 
cesses for  interpreting  the  knowledge  constructs.  In  this 
processing  plus  the  metaknowledge  required  for  our  soft- 
ware architecture  approach  [5]  [9],  we  will  begin  to  have 
the  necessary  capabilities  to  handle  and  explicit  contex- 
tual information  which  we  consider  essential. 

The  current  research  project  is  constructing  a  software  sys- 
tem that  implements  this  approach,  and  in  which  we  can 
perform  various  studies  of  effective  knowledge  collection, 
reorganization,  and  extraction. 
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ABSTRACT 

A  semiotic  system  is  a  multi-layer  one,  consisting  of  multi- 
layer networks  from  signs-frames.  One  of  the  main  components 
of  any  semiotic  system  is  a  knowledge-based  repository 
(semiotic  knowledge  base).  The  difference  of  a  semiotic 
knowledge  base  from  an  ordinary  one  is  in  a  structure  of  basic 
units  for  knowledge  representation  (sign  structure)  and  in 
operation  set  over  these  basic  units. 

Multi-layer  logic  (MIX),  developed  by  S.  Ohsuga  and  H. 
Yamauchi,  is  proposed  as  knowledge  representation  language 
m  semiotic  knowledge  bases.  MLL  is  appropriate  apparatus  for 
discription  of  knowledge  structuring  and  aggregation. 
We  have  been  developed  the  extension  of  MLL  syntax,  which 
allows  to  increase  efficiency  of  the  deductive  inference.  The 
Scolemization  algorithm  and  the  unification  algorithm  for 
extented  MLL  syntax  is  developed.  The  system  KM  (Knowledge 
Model),  based  on  deductive  inference  in  MLL,  as  a 
modelling  tool  of  complex-structured  problem  domains  in 
semiotic  knowledge  bases  is  developed.  The  conceptual 
language  for  modelling  a  complex-structured  problem  domain  is 
researched.  The  translator  of  queries  with  conceptual  language 
into  MLL  is  designed. 

KEYWORDS:  semiotic  system,  knowledge  base,  knowledge 
representation  language,  problem  domain,  multi-layer  logic. 

1.  INTRODUCTION 

The  widespread  using  of  computer  technology  in  different 
spheres  of  human  activity  has  made  necessary  to  develop 
semiotic  systems. 

A  semiotic  system  is  a  multi-layer  one,  consisting  of 
multi-layer  networks  from  signs-frames.  The  sign  used  in 
semiotic  systems  has  three  entities:  a  name,  a  concept  and 
representation  which  are  interpreted  as  syntax,  semantics 
and  pragmatics.  A  sign  structure  is  matched  with  a  frame 
structure  that  also  has  similar  entities:  name,  protoframe 
and  frame  example.  One  of  the  main  components  of  any 
semiotic  system  is  a  knowledge-based  repository  (semiotic 
knowledge  base).  The  difference  of  a  semiotic  knowledge 
base  from  an  ordinary  one  is  in  a  structure  of  basic  units 
for  knowledge  representation  (sign  structure)  and  in 
operation  set  over  these  basic  units. 
The  central  problem  of  semiotic  system  development  is 
problem  domain    modelling.  To  model    the  complex- 


structured  problem  domains,  it  is  necessary  to  choice  the 
knowledge  representation  language,  which  is  available  for 
the  description  of  such  domains. 


2.  A  FORMAL  MODEL  OF  A  COMPLEX- 
STRUCTURED  PROBLEM  DOMAIN  IN 
SEMIOTIC  SYSTEMS 

As  a  formal  model  of  a  complex-structured  problem  domain 
we  have  been  proposed  a  hybrid  model  that  combines  the 
"program  engineering"  paradigm  [1,2,3,4]  and  the 
"knowledge  engineering"  one  [5,6],  i.e.  an  object-oriented 
approach  [7,8]  and  procedures,  and  a  model  for 
representing  knowledge  about  a  problem  domain,  should  be 
used  in  order  to  represent  a  formal  description  of  a  problem 
domain  in  semiotic  systems  [9,10,1 1,12]. 
We  have  been  proposed  that  a  logical  model  based  on 
multi-layer  logic  (briefly  MLL)  [13,14],  should  be  used  as  a 
formalism  for  representing  knowledge  about  a  complex- 
strucrured  problem  domain.  MLL  is  an  integration  of  the 
logical  approach  and  of  an  approach  based  on  semantic 
networks.  It  may  be  considered  as  an  object-oriented  first- 
order  predicate  calculus  that  describes  knowledge 
structuring  and  aggregation. 

By  a  slash  we  will  mean  a  kind  of  delimiter  used  in  prefix 
of  a  formula.  Thus,  the  simple  slash  (Qx/X)  is  used  to 
denote  that  x  is  an  element  of  the  set  X  (x  e  X),  yhe  simple 
"thick"  slash  (Qx/  X)  denotes  that  x  is  defined  on  a  set 
whose  elements  are  the  components  of  the  object  X 
(X  V  x),  while  the  double  slash  (Qx//X)  denotes  that  x  is 
defined  on  a  set  whose  elements  are  part  of  the  object  X 
(X  ►  x). 

We  are  developed  the  extension  of  MLL  syntax,  which 
allows  to   increase  efficiency  of  the  deductive  inference. 
The  extented  MLL  syntax  is  represented. 
Alphabet : 

(1)  constants:  a,b,c,...,X,Y,Z  (constant  sets),... 

(2)  variables:  x,y,z,... 

(3)  function  symbols:  f,g,h,... 

(4)  predicate  symbols  :  P,Q,R,... 

(5)  quantifier:  V,  3 

(6)  logic  connectives:  — i,  &,  V,  -» 

(7)  auxilliary  symbols:  #,*,/,  /,//,{,},(,) 
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Terms: 

(1)  .  Any  constant  and  variable  are  terms. 

(2)  .  If  f  is  a  n-ary  function  symbol  and  tx,t2,...,tn  are 
terms,  then  /  (tx ,  t2 , . . . ,  tn  )  is  a  term. 

(3)  .  All  terms  are  obtained  by  applying  the  rules  (1)  and  (2). 
The  rules  of  designing  well-formed  formules  (WFF)  for 
:  extented  MLL  syntax  is  represented. 

Fl.  If  P  is  a  n-ary  predicate  symbol  and  f, ,  t2  > . . . ,  tn  are 

terms,  then  P(tx,  t2, . . . ,  t J  is  WFF. 
F2.  If  F  and  G  are  WFF,  then  -,F,  F&G,  FvG,  F-*G  are 
WFF. 

F3  .  If  F  is  a  WFF  and  x  is  an  object  variable,  then 

(1)  .  (Vx  /  y)  F  and  (3x  /  y)  F  are  WFF,  where  y  is  a 
constant  or  variable  (here  /  is  an  ordinary  slash). 

(2)  .  (Vx  /  y)F  and  (3x  /  y)F  are  WFF,  where  y  is  a 
constant  or  variable  (here  /  is  a  fat  slash). 

(3)  .  (Vx  //  y)  F  and  (3x  //  y)  F  are  WFF,  where  y  is  a 
constant  or  variable(here  //  is  a  double  slash). 

(4)  .  (V(x  /  Z)//y)  Fand  (3(x  /  Z)//y)  F  are  WFF,  where 
y  is  a  constant  or  variable,  Z  is  a  constant  set. 

F4.  There  are  no  other  rules  of  designing  WFF. 

An  inference  algorithm  for  extented  MLL  syntax  has  been 

developed. 

At  the  present  time  the  system  KM  (Knowledge  Model) 
based  on  deductive  inference  in  MLL  is  designed. 

3.  THE  MODELLING  SYSTEM  KM  OF  A 
COMPLEX-STRUCTURED  PROBLEM 
DOMAIN 

The  system  KM  (Knowledge  Model)  as  a  modelling  tool  of 
a  complex-structured  problem  domains  is  developed.  The 
system  KM  allows  : 

to  construct  the  ISA-hierarchy  of  object  classes; 
to  define  the  object  classes  attributes  and  the 
relations  between  object  classes; 
to  construct  the  Part-of  -hierarchy  of  object  classes; 
to  define  the  class  representatives; 
to  construct  the  representatives  Part-of  -hierarchy; 
to  browse  the  class  objects  and  the  hierarchies; 
to  describe  the  logic  formulas; 
to  receive  answer  to  queries. 
The  architecture  of  the  KM  system  is  represented  in  Fig.ure 
1. 

The  main  components  of  the  system  KM  are: 

•  subsystem    of    the    problem    domain  model 
controlling; 

•  subsystem  of  modelling; 

•  subsystem  of  browser; 

•  knowledge  base; 

•  subsystem  of  the  deductive  inference. 

The  subsystem  of  deductive  inference,  based  on  inference  in 
MLL,  is  served  for: 

•  logical  verification  of  information  represented  in  a 
knowledge  base; 


obtaining   attribute  values   and   extensions  of 

relations  what  allows  to  "compress"  the  extensional 

component  of  the  knowledge  base; 

receiving  the  new  knowledge  from  knowledge, 

represented  in  the  knowledge  base; 

obtaining  answers  to  queries. 


subsystem  of 
the  problem 
domain  model 
controlling 


knowledge 
base 


subsystem  of 
browser 


subsystem  of  the 
deductive  inference 


Figure  1.  The  architecture  of  the  KM  system 
Example  1. 

A  program  component  x,  including  in  the  control 
support  system  (CSS)  #P,  provides  landing  an 
airplane  y,  assigned  to  the  airport  #A,  if  there  are: 

•  PC  t,  in  which  x  operates; 

•  a  radiolocation  station  (RLS)  s,  connected  with  PC 
t; 

•  information  flow  #11,  containing  message  flow  tl, 
accepted  by  RLS  s  and  containing  message  class 
rl,  describing  an  airplane  y  and  processing  by  a 
program  component  x; 

•  information  flow  #12,  produced  by  a  program 
component  x  and  accepted  by  an  airplane  y, 
containing  message  flow  t2,  transfering  RLS  s,  in 
which  there  is  a  message  class  r2,  containing 
information  about  airplane  landing  y. 

The  structures  of  the  problem  domain  is  represented  in 
Figure  2. 

Formalization  in  MLL  with  extented  syntax: 

(3(x/program_component)//#P)  (V(y/airplane)//#0) 
(3 (s/RLS)//# A)(3 (t/PC)//#A)  (3 (t  1  /message_flow)//#I  1 ) 
(3(rl/message_class)//tl)  (3(t2/message_flow)//#I2) 
(3(r2/message_class)//t2) 

Operate(x,t)  &  Connect(s,t)  &  Accept_RLS(s,tl)  & 
Described  l,y)  &Process(x,rl)  &  Transfer_RLS(s,t2) 
&  Produce(x,r2)  &  Accept_airplane(y,r2)  -> 
Landing(x,y) 
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Message_Flow 


MF11 


Message_Flow 


MF13 


MF21 


Message_Class 
\ 


MF23 


Message_Class 


MC211 


MC111        MC112      MC113  —  MC212  MC213 

—    -  "Component-of "  relation  (  V  ) 

  -  "Element-of "  relation  ( e  ) 

-  "Part-of"  relation  ( ►  ) 

Figure  2.  The  structures  of  the  problem  domain 
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The  conceptual  language  for  modelling  a  complex- 
structured  problem  domain  is  developed.  The  translator  of 
queries  with  conceptual  language  to  MLL  is  designed. 

The  examples  of  queries . 

Example  2. 

Let  us  define  the  program  components,  which  are  part 
of  the  developed  system  P,  and  the  airplanes,  which  are  part 
of  the  controlling  object  O,  between  which  the  specific 
relation  "provide  Jly_upJo_  landing"  is  set  up. 

Formalization  in  MLL: 

(3 (x/Program_component)//#P)  (3 (y/airplane)//#0) 
provide_fly_up_to_landing(x,y)-»  . 

Example  3. 

Let  us  define  the  program  components,  which  are  part 
of  the  developed  system  P,  and  the  computers,  which  are 
part  of  the  airport  A,  between  which  function  relation  is 
set. 

Formalization  in  MLL: 

(3  (x/Program_component)//#P)  (3  (t/computer)//# A) 
function(x,t)-»  . 

The  system  KM  supports  "a  free  connection"  [15]  of  KB  and 
DB  under  Paradox  DBMS.  Such  implementation  of  the 
system  KM  provides  using  all  opportunities  Paradox  DBMS 
such  as  distributed  processing,  high  performance,  complex 
verification,  supporting  of  integrity  and  safety  of  data, 
failure  and  recovery  of  DB,  supporting  of  very  large  DB. 
The  system  KM  is  running  under  Windows'95.  The 
development  language  of  the  system  KM  is  Borland  C++. 

4 .  CONCLUSIONS 

We  have  considered  a  new  approach  to  desinging  semiotic 
systems.  The  knowledge  representation  language  in 
semiotic  systems  is  proposed.  The  system  KM  as  a  tool  of 
modelling  a  complex-structured  problem  domain  is 
developed. 

This  work  was  supported  by  the  Russian  Fund  of 
Fundamental  Researches  (project  code  96-01-00125). 
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Abstract 

Pattern  recognition  is  a  vital  part  of  many  intelligent  systems. 
Various  uses  of  intelligence  in  several  new  algorithms  are 
described  for  a  range  of  image  processing  problems:  detection, 
algorithm  fusion,  feature  extraction,  object  representation, 
optimal  features  for  both  representation  and  discrimination, 
object  classification  and  pose  estimation.  The  techniques 
employed  include  biologically  motivated  Gabor  wavelet  filters, 
new  neural  nets,  and  nonlinear  image  processing.  Test  results  on 
IR,  SAR,  robotic  and  active  vision  data  are  presented. 

/.  Detection 

This  is  the  first  step  in  general  image  processing  or  scene 
analysis,  locate  all  objects  of  interest  in  the  scene  independent 
of  object  distortions  and  object  class.  These  regions  of  interest 
(ROIs)  are  then  further  processed  to  reduce  false  alarms  and 
determine  the  class  and  pose  of  the  contents  of  each  ROI.  There 
is  no  one  magic  detection  filter  or  algorithm,  similarly  humans 
do  not  use  just  one  technique  for  detection.  The  best  results  we 
have  seen  are  obtained  by  applying  several  different  detection 
algorithms  to  a  scene  and  then  fusing  the  output  results  to  obtain 
one  detection  plane  with  peaks  at  all  ROI  locations. 

Algorithms 

The  best  detection  algorithms  we  have  found  are  the 
morphological  wavelet  transform  (this  applies  morphological 
and  Gabor  wavelet  filters  to  the  scene  and  fuses  their  outputs) 
[1]  and  a  Gabor  basis  function  filter  [3].  The  morphological 
processing  removes  spatially-varying  background  information 
while  the  Gabor  processing  locates  clutter  regions  of  the  scene 
where  target  detection  confidence  will  be  low;  the  Gabor  basis 
function  filters  detection  objects.  All  results  are  distortion- 
invariant. 

Algorithm  Fusion 

To  combine  the  results  from  different  detection  filters,  the 
best  algorithm  fusion  technique  we  found  [1]  was  to  use  fuzzy 
logic  concepts  to  produce  nonlinear  membership  mapping 
functions  to  apply  to  each  detection  algorithm's  analog  output. 
These  map  the  outputs  for  the  different  algorithms  to  the  same 
range.  We  use  different  mapping  functions  for  different  ranges 
of  object  contrast  We  then  use  pointwise  minimum  fusion  (a 
fuzzy  logic  AND)  to  fuse  the  different  algorithm  outputs. 

These  techniques  thus  use  a  variety  of  different  image 
processing  techniques  (morphology,  distortion-invariant  filters, 
wavelets,  fuzzy  logic,  fusion). 


IR  and  SAR  Results 

For  IR  data  (360°  aspect  distortions  of  8  different  object 
classes  in  severe  clutter,  with  poor  object  contrast)  we  achieved 
excellent  results  (Table  1)  for  detection  (PD)  and  false  alarms 
per  scene  (Pfa)-  Algorithm  fusion  reduces  Pfa  by  up  to  factor 
10.  Subsequent  processor  stages  further  reduce  Pfa  to  below  0. 1 
per  km2  with  Pc  =  95%  correct  recognition  (Section  2). 

Table  1:  Algorithm  fusion  detection  results  (TRIM-2) 


PD(%) 

PFA 

90.6 
80% 

3.1 
0.9 

For  SAR  data  [2],  we  designed  a  new  set  of  distortion- 
invariant  filters  (about  3  per  class).  These  are  applied  to  an  input 
scene  and  the  WTA  output  if  above  a  threshold  indicates  an 
ROI,  the  filter  with  the  largest  output  above  a  threshold  denotes 
the  class  of  the  ROI  data.  This  technique  requires  many  fewer 
templates  than  in  model  based  methods.  The  filters  are  shift- 
invariant  and  hence  easily  implemented  in  DSP  hardware.  We 
tested  these  filters  on  two  different  real  SAR  target  databases 
and  a  synthetic  database  of  SAR  target  models.  For  clutter,  we 
used  half  of  the  man-made  and  natural  clutter  false  alarms  over 
76  km2  that  passed  the  first  two  stages  of  the  Lincoln  Labs  SAR 
processor.  For  four  class  data  with  360°  of  distortion,  we 
obtained  PD  =  99.8%,  Pc  =  96.6%  and  a  superb  PFA  =  0.026/ 
km2.  We  tested  the  filters  at  SNR  =  0  dB  and  found  a  negligible 
decrease  in  Pc  of  0.5%.  We  tested  the  filters  for  25%  object 
obscuration  and  achieved  Pc  =  80%.  Thus,  these  intelligent 
filters  are  robust. 

New  Intelligent  Filter/fusion  Design 

We  now  advance  a  new  neural  net  (NN)  data  driven  concept 
to  use  intelligence  to  automate  the  design  of  several  different 
Gabor  wavelet  detection  filters  (selection  of  their  parameters) 
AND  the  simultaneous  design  of  the  combination  coefficients  to 
be  used  to  fuse  different  filter  outputs  [3].  This  represents  an 
automated  data-driven  intelligent  image  processing  system.  We 
discuss  this  approach  using  Gabor  wavelet  filter  functions 

2    2      2  2 
7i(x  /a  +y  /b  )  -j27trcocos(e-<|)) 
gn  =  e  e  v/  (1) 

When  such  a  filter  is  applied  to  a  scene,  its  analog  output 
denotes  the  amount  of  a  given  spatial  frequency  w  at  a  given 
orientation  <))  present  in  each  local  scene  region  defined  by  a  and 
b.  We  consider  Gabor  wavelet  (GW)  functions  with  modulation 
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in  1-D  only,  we  select  co  =  l/2d  (where  d  =  a  or  b  for  horizontal 
and  vertical  features  respectively).  We  use  filters  formed  from 
the  real  and  the  imaginary  parts  of  the  Gabor  function  (GF);  real 
Gabor  filters  are  proven  blob  detectors  and  imaginary  Gabor 
filters  are  proven  edge  detectors. 

We  use  6  GFs,  two  real  (these  detect  object  height  and  width) 
and  four  imaginary  (these  detect  the  left,  right,  top  and  bottom 
object  edges).  We  developed  a  separate  algorithm  to  determine 
the  optimal  parameters  for  each  GF  separately.  These  6  GF 
outputs  or  GF  filters  are  the  Pj  inputs  we  consider  to  the  neural 
net  processor  of  Figure  1.  The  Pj  to  P2  weights  combine  these 
different  GF  functions  into  N2  different  macro  GFs  at  P2.  The 
P2  outputs  represent  different  detection  correlation  planes  for 
different  macro  GF  filters.  These  outputs  are  then  combined  (P2 
to  P3  weights)  into  N3  fused  output  correlation  planes  at  P3, 
where  a  WTA  determines  if  each  local  input  scene  region  is  an 
ROI  or  not  (and  the  confidence  of  each  estimate).  The  Pi  to  P2 
weights  are  complex-valued  and  the  P2  nonlinearity  used  is  a 
simple  square  law.  These  are  radically  new  neural  net  concepts. 
We  have  shown  that  this  produces  general  quadratic  nonlinear 
decision  surfaces  of  higher  rank  than  other  methods.  The  P2 
inputs  represent  the  N2  new  macro  GW  filters  (combinations  of 
the  basic  1-D  GF  input  Pi  functions),  the  P2  to  P3  weights 
represent  the  algorithm  fusion  of  these  N2  detection  filters.  Both 
the  filters  and  their  output  fusion  are  automatically  selected  and 
are  data  driven.  This  thus  represents  a  very  intelligent  and 
general  distortion-invariant  filter  synthesis  and  algorithm 
fusion  concept.  Initial  results  show  better  Pc  and  PpA  on 
distorted  multi-class  objects  than  any  other  Gabor  filter  method. 


Pi 


Fig.  1.  Extended  piecewise  quadratic  neural  network 
(E-PQNN)  architecture. 


2.  Distorted  Object  Classification 

Feature  Space  Trajectory  (FST)  Processor 

Our  distortion-invariant  filters  represent  one  method  to 
drastically  compress  a  model  base  of  different  aspect  views  etc. 
(distortions)  of  an  object.  When  object  class  and  pose  are 
necessary,  the  FST  representation  is  very  novel  and  useful  [4,5]. 
In  this  case,  different  aspect  views  of  an  object  are  represented 
by  different  points  in  feature  space.  Points  corresponding  to 


adjacent  aspect  views  are  connected  by  straight  lines.  This 
produces  an  FST  (piecewise  linear  trajectory).  Different  FSTs 
are  produced  for  different  objects.  This  is  an  attractive  semiotic 
representation  for  all  distorted  views  of  an  object  To  classify  an 
unknown  input,  we  plot  it  as  a  point  in  feature  space.  The 
closest  FST  provides  the  class  estimate;  the  closest  line  segment 
on  that  FST  provides  a  pose  estimate  for  the  input.  If  an  input 
point  is  too  far  from  an  FST,  it  is  rejected  as  clutter.  This 
representation  overcomes  many  standard  classifier  problems 
(training  set  size,  generalization  etc.)  and  requires  very  low  on- 
line calculations. 

IR  and  SAR  Results.  For  our  8-class  distorted  IR  database, 
we  used  11  aspect  views  of  each  object  as  a  training  set  and 
produced  an  FST  for  each  object  using  shift-invariant  wedge 
magnitude  Fourier  features  [5]  (each  FST  has  only  1 1  vertices). 
We  correcdy  classified  95%  =  Pc  of  all  test  set  aspect  view 
inputs.  Of  more  importance,  we  input  the  200  clutter  region 
false  alarms  (from  our  detection  stage)  to  the  FST  processor.  We 
were  able  to  reject  all  but  one  clutter  input  and  reduce  PFA 
below  0.1/km2. 

We  also  applied  the  FST  to  SAR  data.  For  2  classes  of 
objects  in  real  SAR  (with  every  other  aspect  view  used  for 
training),  we  achieved  Pc  =  100%.  With  0  dB  SNR,  we 
achieved  negligible  loss  (Pc  =  99.77%).  In  both  cases,  (for  100 
of  the  200  clutter  chips  that  passed  the  first  two  stages  of  the 
Lincoln  Labs  processor),  we  achieved  perfect  PpA  =  0  false 
alarm  rejection.  For  a  4-class  synthetic  SAR  object  database,  we 
achieved  Pc  =  98.33%  and  PFA  =  0.026/km2.  Thus,  the  FST  is  a 
very  attractive  representation  and  allows  for  a  unique  processor 
using  semiotic  concepts.  It  requires  only  6.5  kbytes  of  storage 
per  object  and  only  6.5  kops  per  class. 

Object  Representation.  We  have  developed  a  technique  to 
reduce  the  number  of  vertices  (aspect  views)  necessary  to 
represent  an  object  and  to  determine  which  aspect  views  to  use. 
We  have  used  this  to  analyze  the  representation  error  in  our  8- 
class  IR  data  (use  of  only  1 1  vertices  (aspect  views)  per  FST 
produces  a  worst-case  average  representation  error  of  only  3%). 
We  used  it  to  reduce  the  number  of  training  set  images  in  our 
SAR  data  from  180  to  80  aspect  views  per  class  for  a  2  class 
real  SAR  database  (we  achieved  Pc  =  96%  with  PFA  =  0  still). 

Feature  Spaces.  Wedge  or  ring  Fourier  features  are  attractive 
as  they  are  shift-invariant.  Another  attractive  semiotic 
description  of  distorted  object  views  are  the  dominant 
eigenfeatures  (KL);  these  perform  better  for  SAR  data.  For 
several  object  classes,  we  use  FK  eigenfeatures  (they  provide 
better  discrimination).  We  note  that  calculating  FK  features 
from  KL  features  rather  than  imagery  is  preferable  (outliers  are 
automatically  removed),  that  use  of  more  KL  features  is  not  best 
(outlier  data  is  included),  and  that  we  have  developed  eigen 
techniques  to  determine  if  a  training  and  test  set  are  compatible 
and  to  estimate  Pc  to  be  expected.  We  find  these  new 
eigenspaces  an  new  eigenanalysis  methods  to  be  new  intelligent 
semiotic  data  analysis  tools. 
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Active  Vision  Uses 

We  have  used  a  time  sequence  of  views  of  an  object  over 
several  frames  as  input  to  the  FST  [5]  and  shown  that  by 
matching  an  input  sequence  to  FST  sections  that  individual 
errors  in  separate  frames  (as  much  as  50%)  could  be  overcome 
(Pc  «  100%).  This  new  semiotic  representation  and  processor 
thus  also  handles  time  sequences  of  data. 

We  recently  [5]  also  considered  various  active  vision 
applications  of  the  FST  in  robotics.  In  active  vision,  a  sensor  (or 
robot)  acquires  a  view  of  an  object  and  then  moves  to  another 
location  to  better  learn  the  object  (or  to  grasp  it,  inspect  it,  etc.). 
In  such  robotics  case,  the  object  or  a  model-based  CAD 
description  of  it  is  available  and  hence  an  infinite  number  of 
distorted  object  aspect  views  are  possible.  The  FST 
representation  and  our  adaptive  algorithm  are  essential  here  to 
select  the  number  of  aspect  views  needed  and  which  aspect 
views.  We  have  shown  that  our  algorithm  yields  better 
representation  than  use  of  aspect  views  evenly-spaced  in  aspect 
angle.  We  have  also  developed  new  analysis  measures  to 
determine  which  aspect  view  produces  a  pose  estimate  with  the 
best  accuracy  and  which  aspect  view  provides  a  class  estimate 
with  the  best  confidence. 

We  use  the  FST  to  estimate  the  class  and  poise  of  the  object 
from  a  given  view;  we  then  use  this  to  determine  where  to  drive 
the  sensor  or  robot  to  look  next  to  obtain:  a  better  aspect  pose 
estimate,  a  better  class  estimation,  a  better  view  for  inspection 
or  to  grasp  the  object,  etc. 

Higher-order  Features 

Higher-order  features  can  provide  more  information 
regarding  input  data  (higher-order  correlations).  Nonlinear  PC  A 
has  been  applied  to  such  cases,  but  all  techniques  are  iterative 
and  have  various  problems  (most  notably  the  rank  of  the 
decision  surfaces  produced).  We  have  various  new  intelligent 
semiotic  ideas  on  how  to  extract  preferable  nonlinear  features 
(that  provide  both  representation  and  discrimination)  and  how 
to  calculate  such  discriminant  functions  in  closed  form. 
Standard  KL  features  provide  only  representative,  standard  FK 
(or  Fisher  etc.)  features  provide  only  discrimination.  To  select 
features  that  are  good  for  both  representation  and  discrimination 
we  define  a  representative  measure  Er  (using  the  covariance 
matrix  C  of  the  data  matrix)  and  a  discrimination  measure  Ed 
using  the  matrix 

Ri2  =  IX(*iP-V(*ip-VT'  (2) 

where  ii  and  x.2  316  class  1  and  2  samples.  For  both  good 
representation  and  discrimination,  we  select  the  dominant 
eigenvectors  of  £1+£2+Ri2  (f°r  me  two-class  case)  as  our 
features.  To  produce  nonlinear  features,  we  describe  each  data 
vector  in  a  higher-dimensional  space  as  (y^iih  higher  order 
products  included)  and  we  use  higher-order  C  and  R  matrices. 
We  solve  for  nonlinear  higher-order  features  in  the  higher-order 
H  space  (the  solution  has  a  closed  form);  the  solutions  are 
nonlinear  in  the  original  space. 


Initial  tests  show  that  this  approach  yields  decision  surfaces 
of  higher  rank  (than  nonlinear  PC  A  can),  that  higher-order 
correlations  exist  in  real  data,  and  that  their  use  improves  P^. 
This  is  a  very  attractive,  new  (intelligent  and  automated) 
semiotic  method  to  nonlinearly  represent  and  process  data. 
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1.  ABSTRACT 

The  theory  of  artificial  neural  networks  has  been  suc- 
cessfully applied  to  a  wide  variety  of  pattern  recognition 
problems.  In  this  theory,  the  first  step  in  computing  the 
next  state  of  a  neuron  or  in  performing  the  next  layer  neu- 
ral network  computation  involves  the  linear  operation  of 
multiplying  neural  values  by  their  synaptic  strengths  and 
adding  the  results.  Application  of  a  nonlinear  activation 
function  usually  follows  the  linear  operation  in  order  to 
provide  for  nonlinearity  of  the  network.  In  this  paper  we 
discuss  a  novel  class  of  artificial  neural  networks,  based 
on  lattice  algebra,  in  which  the  operations  of  multiplication 
and  addition  are  replaced  by  addition  and  maximum  (or 
minimum),  respectively.  By  taking  the  maximum  (or  mini- 
mum) of  sums  instead  of  the  sum  of  products,  computation 
is  nonlinear  before  the  application  of  a  nonlinear  activa- 
tion function.  As  a  consequence,  the  properties  of  these 
networks,  which  are  also  known  as  morphological  neural 
networks,  are  drastically  different  than  those  of  traditional 
neural  network  models.  The  main  emphasis  of  the  results 
presented  here  is  on  morphological  perceptrons.  Within  the 
constraints  of  this  short  paper,  we  define  the  basic  com- 
putational model  of  a  morphological  neuron  and  examine 
some  differences  between  morphological  perceptrons  and 
traditional  perceptron  models. 

Keywords;  Artificial  neural  networks,  perceptrons,  morpho- 
logical neural  networks,  morphological  perceptrons. 

2.  INTRODUCTION 

In  recent  years  a  novel  theory  of  neural  computing 
based  on  lattice  theory  has  emerged.  Network  models  based 
on  this  theory  have,  in  several  cases,  proven  to  be  superior 
to  traditional  models.  Some  references  of  the  underlying 
theory  and  applications  of  these  networks  can  be  found  in 
[5,  2,  1,  10,  3,  1 1,  8].  In  this  paper  we  restrict  our  attention 
to  morphological  perceptrons. 

Artificial  neural  network  models  are  specified  by  the 
network  topology,  node  characteristics,  and  training  or 
learning  rules.   The  basic  equation  governing  the  theory 


of  computation  in  the  standard  neural  network  model  is: 

Xj(i  +  i)  =  f\J2 xi(t)  ■  wn  -e)j,  (i) 

where  xj(i)  denotes  the  value  of  the  jth  neuron  at  time 
t,  n  represents  the  number  of  neurons  in  the  network,  Wij 
the  synaptic  connectivity  value  between  the  z'th  neuron  and 
the  jth  neuron,  9j  a  threshold,  and  /  the  next-state  func- 
tion which  usually  introduces  a  nonlinearity  into  the  net- 
work. Although  not  all  current  network  models  can  be  pre- 
cisely described  by  this  equation,  they  nevertheless  can  be 
viewed  as  variations  of  this  equation.  In  contrast,  the  ba- 
sic equation  governing  neural  computation  based  on  lattice 
algebra  is  vastly  different  from  the  weighted  sum  of  linear 
algebra  used  in  most  neural  computation.  The  next-state 
value  of  a  neuron  is  obtained  from  an  additively  weighted 
maximum  (or  minimum)  of  inputs  from  neighboring  neu- 
rons connected  to  the  neuron  and  the  resultant  value  is  then 
passed  through  a  hard-limiter.  More  precisely,  computation 
of  the  next  state  value  of  a  particular  neuron  Nj  assumes 
the  form 

Xj  (t  +  1)  =  /  (pj  \/  Tij  {Xi  (t)  +  Wij)^  ,  (2) 

where  r,j  =  ±1  denotes  the  excitatory  or  inhibitory  influ- 
ence of  the  /th  neuron  on  the  jth  neuron,  and  pj  =  ±1 
denotes  the  post-synaptic  response  of  the  jth  neuron  to  the 
total  received  input.  The  values  +1  and  —1  denote  excita- 
tory and  inhibitory  influences,  respectively.  In  particular,  if 
tij  =  1,  then  the  tth  neuron  has  an  excitatory  influence  on 
the  jth  neuron,  and  if  r^-  =  —1,  then  the  ith  neuron  tries 
to  inhibit  the  firing  of  the  jth  neuron.  Similarly,  if  pj  =  1, 
then  the  post-synaptic  response  of  the  jth  neuron  is  exci- 
tatory and  if  pj  —  —1,  then  the  post-synaptic  response  of 
the  jth  neuron  is  inhibitory.  The  function /is  a  hard-limiter 
defined  as  f(x)  =  1  if  x  >  0  and  f(x)  =  0  if  x  <  0. 

Neurons  that  compute  their  next  state  using  Equation  2 
are  also  called  morphological  neurons  and  neural  networks 
based  on  lattice  theory  computation  are  known  as  morpho- 
logical neural  networks  [8].  Figure  1  illustrates  the  compu- 
tational framework  of  neurons  in  a  general  morphological 
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neural  net.  An  obvious  advantage  of  this  framework  is  that 
it  more  aligned  with  the  biological  model  and  does  not  in- 
volve multiplication,  thus  providing  for  faster  computation. 


N, 


Figure  1.  Computational  model  of  a  morphological  neuron. 

The  basic  neural  computations  as  expressed  by  Equa- 
tions 1  and  2  are  based  on  two  fundamentally  different  al- 
gebraic systems.  Observe  that  the  computation  represented 
by  Equation  1  is  based  on  the  operations  of  the  algebraic 
structure  (R,+,  x),  where  R  denotes  the  set  of  real  num- 
bers. In  particular,  the  computation  J2xiw*  is  a  linear  op- 
eration involving  multiplication  and  addition.  In  contrast, 
the  basic  computation  occurring  in  a  morphological  neu- 
ron, as  expressed  by  Equation  2,  is  based  on  thesemi-ring 
structure  (R_oo ,  V,  +)  (or  (Rqo  ,  A,  +)  if  we  replace  the  op- 
eration of  maximum  by  minimum).  Here  R_oo  represents 
the  extended  real  number  systems  R_oo  =  Ru{— oo}. 
The  basic  arithmetic  and  logic  operations  in  the  extended 
real  number  systems  are  as  follows.  The  symbol  +  de- 
notes the  usual  addition  with  the  additional  stipulation  that 
a  +  (— oo)  =  (— oo)  +  a  =  — oo  Va  (E  R_oo»  while  the 
symbol  V  denotes  the  binary  operation  of  maximum  with 
the  additional  stipulation  that  a  V  (— oo)  =  (-oo)  V  a  =  a 
Va  G  R_oo-  Note  that  the  symbol  — oo  acts  like  a  zero 
element  in  the  system  (R-ooiV,+)  if  one  views  V  as  ad- 
dition and  +  as  multiplication.  Also,  the  role  of  the  mul- 
tiplicative identity  in  the  structure  (R,+,  x)  is  played  by 
the  number  1;  i.e.,  1  •  a  =  a  ■  1  =  a  Va  €  R.  In  the 
structure  (R_00>V,+)  his  role  is  played  by  the  number  0 
since  0  +  a  =  a  +  0  =  a  VaG  R-oo- 

Several  advantages  of  morphological  neural  networks 
over  traditional  neural  networks  models  have  been  estab- 
lished in  the  area  of  associative  memories.  In  [9]  it  was 
shown  that  morphological  analogues  of  the  Hopfield  auto- 
associative  memory  have  infinite  storage  capacity,  provide 
perfect  output  for  perfect  input  patterns,  and  are  more  robust 
in  the  presence  of  noise.  It  has  also  been  proven  that  mor- 
phological bidirectional  associative  memories  easily  outper- 
form traditional  bidirectional  associative  memories  in  terms 


of  storage  capacity,  bidirectional  recall,  as  well  as  recall 
under  noisy  conditions  [6].  In  this  paper  we  discuss  some 
of  the  advantages  of  morphological  perceptrons 


3.    MORPHOLOGICAL  PERCEPTRONS 


Due  to  page  restrictions  we  shall  concentrate  our  dis- 
cussion mostly  on  single-layer  morphological  perceptrons 
(SLMPs)  and  indicate  how  these  can  be  used  to  construct 
powerful  multilayer  perceptrons. 

Some  of  the  most  successful  associative  memories 
have  been  multilayer  perceptrons.  These  are  feedforward 
nets  that  can  be  trained  using  such  well-known  algorithms 
as  back  propagation.  Some  of  the  problems  associated  with 
these  nets  are  the  selection  of  the  number  of  hidden  layers, 
the  choice  of  the  number  of  neurons  within  a  hidden  layer, 
very  lengthy  training  procedures,  and  convergence  problems 
associated  with  training  algorithms.  In  contrast,  morpholog- 
ical perceptrons  have  very  simple  learning  algorithms  that 
converge  in  two  passes  and  these  algorithms  actually  spec- 
ify the  number  of  neuron  in  a  hidden  layer  necessary  for  a 
given  classification  problem  [7]. 

The  basic  building  blocks  of  morphological  percep- 
tions are  single-layer  morphological  perceptrons.  A  single- 
layer  perception  consists  of  an  input  layer  of  n  neurons 
and  an  output  layer  consisting  of  a  single  neuron  that  com- 
putes a  weighted  sum  of  the  input  neuron  values,  subtracts 
a  threshold,  and  passes  the  result  through  a  binary-valued 
non-linearity.  Each  of  the  two  possible  outputs  corresponds 
to  a  different  classification  response.  Typically,  a  pattern 
vector  x  =  (xj,  12,  . . .  ,  xn)  is  presented  to  the  input  neu- 
rons, with  each  input  neuron  assuming  the  value  (or  state) 
of  the  corresponding  vector  coordinate  value.  The  pattern 
x  is  said  to  belong  to  class  Co  or  class  C\  depending  as  to 
whether  the  output  neuron's  value  is  0  or  1,  respectively. 
As  shown  in  Figure  2,  the  topology  of  single-layer  morpho- 
logical perceptrons  mimics  that  of  the  standard  single-layer 
perception.  However,  computation  at  the  output  neuron  is 
based  on  the  computational  model  given  by  Equation  2.  In 
particular,  the  result  of  the  output  neuron  is  given  by 

/(pyjiixi+wi)^,  o) 

where  x,-  denotes  the  value  of  the  ith  input  neuron,  if, 
the  synaptic  weight  between  the  /th  input  neuron  and  the 
output  neuron,  r,-  indicates  whether  the  input  of  the  ith  input 
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neuron  to  the  output  neuron  is  excitatory  or  inhibitory,  and 
p  denotes  the  post- synaptic  response  of  the  output  neuron. 


output 


input 
neurons 


Figure  2.  Single-layer  morphological  perception. 

It  is  not  difficult  to  ascertain  that  a  single-layer  mor- 
phological perception  can  provide  solutions  to  problems  that 
cannot  be  solved  by  traditional  single-layer  perceptions.  We 
provide  a  simple  example  of  this. 

Example  1.  Suppose  we  want  a  single-layer  morpho- 
logical perception  to  classify  pattern  vectors  (2,0),  (0,2), 
(2, 2)  and  close  neighbors  to  belong  to  class  C\,  and  pattern 
vectors  (1,  0),  (0, 1),  (1, 1)  and  close  neighboring  points  to 
belong  to  class  Co.  Simple  computation  will  verify  that 
the  net  shown  in  Figure  2  with  n  =  2,  uii  =  tu2  =  — 1.5, 
f\  =  r2  =  1,  and  p  =  1  will  correctly  solve  this  problem. 
For  example,  using  the  inputs  x  =  (0,1)  and  x  =  (2,2) 
we  obtain 

/[(0-1.5)V(l-1.5)]  =  /(-.5)  =  0 

and  (4) 
/[(2-1.5)V(2-1.5)]  =  /(.5)  =  l, 

respectively.  Therefore,  (0, 1)  £  Co  and  (2,2)  G  C\. 

An  interesting  observation  is  that  the  single- 
layer  perception  of  this  example  provides  a  solution 
for  a  problem  that  cannot  be  solved  by  the  tradi- 
tional single-layer  perception  model  since  the  two 
classes  of  patterns  are  not  linearly  separable.  This 
observation  follows  from  the  fact  that  the  equation 
±Mxi  +  wx)  V  r2(x2  +  w2)  V  •  •  •  V  rn(xn  +  wn)]  =  0 
represents  an  (n  —  l)-dimensional  surface,  called  the 
perception's  decision  boundary,  which  divides  Euclidean 
n-space  into  two  regions.  The  regions  separated  by  the 
decision  boundary  coirespond  the  output  neuron's  on-qff 
regions  as  any  input  from  one  region  will  always  result  in 
an  on  state  (value  1)  of  the  neuron  while  any  input  from 
the  other  region  will  result  in  an  off  state  (value  0)  of  the 
neuron.  For  instance,  the  decision  boundary  generated  by 
the  perceptron  of  Example  1  is  the  infinite  step  function 


shown  in  Figure  3.  Any  point  in  the  quarter-plane  bounded 
by  the  lines  x  =  —w1  and  y  =  —w2  will  result  in  a  O-state 
value  if  used  as  input  to  the  perception.  Therefore,  this 
perception  could  solve  the  two-class  problem  illustrated  in 
Figure  3,  where  patterns  of  one  class  belong  to  the  region 
marked  "class  0"  and  patterns  of  the  other  class  belonging 
to  the  region  marked  "class  1." 


class  1 


class  0 


Figure  3.  Decision  boundary  for  the  single-layer 
morphological  perception  of  Example  1. 


Single-layer  morphological  perceptions  are  the  basic 
building  blocks  of  multilayer  morphological  perceptions. 
Thus,  in  oidei  to  construct  multilayer  morphological  per- 
ceptions, it  is  necessary  to  have  a  thorough  understanding 
of  SLMPs.  The  output  neurons  of  SLMPs  form  the  hidden 
neurons  multilayer  morphological  perceptions.  The  deci- 
sions made  by  these  hidden  neuions  depend  on  theii  synap- 
tic responses  to  inputs  as  well  as  the  synaptic  weights  of 
the  input  axons.  These  responses  and  weights  determine 
the  decision  boundaries  for  the  neuron's  on-off  regions. 
For  a  two-input  nodes  (n  =  2)  SLMP,  the  simplest  deci- 
sion boundary  is  obtained  by  setting  one  of  the  weights, 
wi  or  W2,  equal  to  — oo.  If  w\  =  -co,  then  the  de- 
cision boundary  is  the  horizontal  line  y  =  —w2,  and  if 
w2  =  -co,  then  the  decision  boundary  will  be  the  vertical 
line  x  =  —w\.  Figure  4  illustrates  these  two  possibilities 
for  n  =  r2  '=  1  and  p  =  1.  Changing  the  value  of  p 
to  p  =  —1  interchanges  the  two  classes  Co  and  C\  since 
the  on  region  becomes  the  off  region  and  vice  versa.  If 
r1  =  1  and  w2  =  — oo,  then  changing  rj  to  ri  =  — 1  will 
also  interchange  the  neuron's  on-off  regions.  However,  it 
is  important  to  note  that  if  W{  =  — oo,  then  we  must  have 
r,-  -  1  since  —  (x;  —  oo)  =  oo  —  x,-  =  oo,  which  is  outside 
the  computational  framework  of  the  algebra  (R_oo,  V,  +). 
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Figure  4.  Decision  boundaries  for  two-input, 
single-layer  morphological  perceptron 
having  one  synaptic  weight  equal  to  — oo. 


The  two  SLMPs  shown  in  Figure  4  can  be  combined  to 
form  a  two-input  multilayer  morphological  perceptron  with 
one  hidden  layer  as  shown  in  Figure  5.  The  output  neuron 
of  this  multilayer  perceptron  computes  the  maximum  of  its 
two  hidden  layer  input  neurons.  Hence,  a  pattern  point  will 
belong  to  class  Co  if  and  only  if  it  was  classified  by  both 
single-layer  perceptrons  as  belonging  to  class  Co.  It  is  easy 
to  see  that  this  multilayer  perceptron  is  equivalent  to  the 
single-layer  perceptron  shown  on  the  bottom  of  Figure  5  if 
w\  —  w\\  and  u>2  =  W22',  a  pattern  vector  x  =  (£1,22)  is 
classified  by  both  perceptrons  as  belonging  to  class  Co  if 
and  only  if  x\  <  —wi  and  x-i  <  —w^- 

The  preceding  discussion  shows  that  a  two-input 
SLMP  can  have  one  of  three  distinct  types  of  basic  deci- 
sion boundaries;  a  horizontal  or  vertical  line,  or  an  infinite 
step  function  formed  by  a  vertical  and  horizontal  half-line. 
Figure  6  shows  the  four  possible  types  of  infinite  step 
functions  for  two-input  SLMPs. 


Figure  5.  Two  equivalent  morphological 
perceptrons.  Here  w\  =  w\\  and  u>2  =  tx>22- 


Figure  6.  Decision  boundaries  for  two-input, 
single-layer  morphological  perceptrons 
with  real-valued  synaptic  weights. 

The  two  algebras  (R_oo,V,+)  and  (Roo,A,+)  are 
dual  algebras.  The  simple  isomorphism  h  :  R_oo  — ►  Roo 
defined  by  h(r)  =  -r  Vr  G  R-00  defines  this  duality. 
Thus,  it  does  not  matter  which  of  these  two  algebras  we 
use  for  the  underlying  mathematical  foundation  of  morpho- 
logical neural  networks.  This  duality  can  be  used  for  inter- 
changing classes.  For  example,  by  negating  neural  inputs 
(i.e.,  interchanging  r,-  with  —  r,)  and  interchanging  the  oper- 
ation of  maximum  with  minimum  interchanges  the  classes 
Co  <-»  Ci.  This  interchange  of  classes  is  illustrated  in  Fig- 
ure 7.  Of  course,  as  mentioned  earlier,  in  order  to  inter- 
change the  on-off  regions  we  could  have  simply  used  an 
inhibitory  post-synaptic  response  for  the  output  neuron  of 
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the  perception  on  the  left  side  of  Figure  7.  This  follows 
from  the  fact  that 

f[-(-(x1  +  w1)Y(x2  +  w2))]  (5) 
=  /[(«!+  Wl)  A  -(x2  +  w2)]. 


Figure  3.  Two  SLMPs  with  identical  decision 
boundaries  but  reversed  on-off  regions. 

In  3-dimensional  space  lines  are  replaced  by  planes 
and  half-lines  by  half-planes.  For  example,  setting  wi  — 
u>2  =  — oo  and  ws  G  R  yields  the  plane  z  —  —w^  as 
the  decision  boundary  for  the  corresponding  three-input, 
single-layer  morphological  perception.  Hence,  by  choos- 
ing Wi  G  R  for  some  £  =  1,  2,  or  3,  and  setting  the  re- 
maining two  weights  equal  to  -co,  three  basic  types  of 
planar  decision  boundary  can  be  obtained.  In  analogy  to 
the  decision  boundaries  shown  in  Figure  4,  each  of  these 
2-dimensional  decision  boundaries  will  be  a  plane  normal 
to  one  of  the  coordinate  axes  and  parallel  to  the  remaining 
two  axes.  Using  two  such  decision  surfaces  (i.e.,  setting 
only  one  of  the  weights  equal  to  -co)  a  more  complex 
decision  boundary  can  be  formed.  This  decision  bound- 
ary corresponds  to  a  2-dimensional  step  function.  For  in- 
stance, if  w\  =  —00  and  u>2,  ti>3  G  R,  the  decision  bound- 
ary divides  3-space  into  two  regions,  namely  the  region 
{(x,  y,  z)  :  y  <  — 102,  z  <  —W3,  x  G  R  }  and  its  comple- 
ment if  all  pre-  and  post-synaptic  influences  are  positive. 
Since  any  two  of  the  decision  planes  will  intersect  in  a 
line,  12  such  basic  2-dimensional  step  functions  can  be 
constructed,  yielding  24  possible  on-off  regions. 

An  even  more  complex  decision  boundary  is  obtained 
when  all  three  weights  are  real-valued.  In  this  case,  the 
decision  boundary  corresponds  to  the  infinitely  extended 
boundary  of  a  three-sided  rectangular  box.  These  analogies 
carry  over  into  higher  dimensional  spaces. 

Two-layer  morphological  perceptions  can  solve  any 
class  problem  in  which  the  classes  are  compact  sets.  This 
is  the  case  in  basically  all  realistic  pattern  classification 
problems,  where  the  patterns  of  a  given  class  belong  to 
some  bounded  set.  For  instance,  in  the  plane,  any  compact 
set  can  be  approximated  arbitrary  closely  by  rectangles. 
Thus,  decision  surfaces  can  be  obtained  via  horizontal  and 


vertical  lines  to  form  any  set  approximating  rectangles.  This 
concept  extrapolates  to  higher  dimensional  compact  sets. 
For  example,  the  two-input,  2-layer  perception  shown  in 
Figure  8  will  classify  all  patterns  falling  within  the  set 
bounded  by  the  lines  xi  —  —w\x,  x\  =  —  w\2,  x\  —  — u>i3, 
x2  =  — u>2i.  x2  =  —W22,  and  £2  —  —^23  shown  in 
the  bottom  half  of  Figure  8  as  belonging  to  class  Co. 
It  should  be  clear  that  any  planar  configuration  can  be 
boxed  in  by  such  sequences  of  step  functions.  The  number 
of  neurons  in  the  hidden  layer  depends,  of  course,  on 
the  number  of  step  functions  required  to  approximate  a 
region.  Another  interesting  obseivation  is  that  the  class  C0 
in  this  example  is  not  convex,  showing  that  a  single  hidden 
layei  is  sufficient  for  both  convex  as  well  as  non-convex 
regions.  This  is  in  contrast  to  standaid  perceptions  that 
require  two  hidden  layers  for  non-convex  class  decisions 
[4].  Note  also  that  if  we  eliminate  the  third  neuron  N3 
in  the  hidden  layer,  then  the  new  net  would  classify  all 
points  in  the  infinite,  non-convex  and  non-compact  region 
R  =  {(xi,x2)  :  x2  >  -W23  if  -  wn  <  xx  <  -wi2 
or  —  W21  <  x2  <  —W22  if  xi  >  —^12}  as  belonging  to 
class  Co.  Thus,  multilayei  morphological  perceptrons  can 
also  classify  patterns  belonging  to  unbounded  regions. 


x2 


Figure  7.  Non-convex  decision  boundary 
of  a  2-layer  morphological  perception. 

Using  the  different  decision  surfaces  that  are  provided 
by  the  single-layer  morphological  perception  discussed  eai- 
lier,  it  will  be  an  interesting  and  instructive  exercise  for 
the  reader  to  construct  2-layer  morphological  perceptrons 
whose  decision  boundaries  approximate  the  boundaries  of 
different  geometric  regions  in  the  plane. 
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4.  CONCLUSION 

This  paper  introduces  neural  network  computing  based 
on  lattice  algebra,  which  provides  a  new  paradigm  for  artifi- 
cial neural  networks.  The  resulting  morphological  networks 
are  radically  different  in  behavior  than  traditional  artificial 
neural  network  models.  One  advantage  of  using  morpho- 
logical neural  networks  over  traditional  artificial  neural  net- 
works is  that  the  algebra  (R_oo,  V,  +)  does  not  involve  the 
operation  of  multiplication  and  thus  provides  for  greatly 
reduced  computational  overhead.  Additionally,  due  to  the 
global  maximum  operation,  the  usual  convergence  problems 
that  exist  in  most  of  the  traditional  approaches  do  not  occur 
in  the  lattice  algebraic  approach.  For  example,  in  [7]  we 
showed  that  training  algorithms  for  morphological  percep- 
trons  converge  in  at  most  two  steps.  Furthermore,  these 
training  algorithms  also  add  all  necessary  neurons  in  the 
hidden  layer.  Our  main  emphasis  this  paper  was  on  morpho- 
logical perceptrons.  We  laid  the  foundations  of  single-layer 
morphological  perceptrons  and  indicated  how  these  can  be 
used  to  construct  multilayer  morphological  perceptrons 
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ABSTRACT 

In  the  last  few  years  neurocontrol  has  made 
enormous  progress,  in  terms  of  new  engineering 
applications,  new  mathematical  designs  and  ideas,  and 
new  links  to  the  brain.  The  success  of  real  life 
applications  depends  on  a  vast  variety  of  factors,  from 
general  design  ideas  down  to  programming  details. 
Sometimes  it  is  not  easy  to  find  the  real  reason  for 
success  or  failure  in  a  particular  application,  because  the 
usual  approach  to  neural  modeling  is  somewhere 
between  art  and  science.  So,  it's  time  to  demystify  the 
process  of  design,  from  idea  to  implementation  to 
interpretation  of  results.  There  exists  a  set  of  benchmark 
problems  described  in  [1]  which  became  a  de  facto 
standard  for  exploring  the  possibilities  of  new  modeling 
and  control  methods.  However,  most  of  these  problems 
(even  though  they  were  intentionally  designed  for 
instructive  purposes)  can  be  implemented  them  in 
different  ways,  which  complicates  the  comparison  of 
results.  The  goal  of  this  paper  is  to  demonstrate  some 
"underwater  rocks"  in  the  application  of  neurocontrol 
methods  using  standard  benchmark  problems  as  a 
commonly  available  touchstone. 

KEYWORDS:  neurocontrol,  benchmark 
problems;  model  evaluation. 

1.  INTRODUCTION 

Adaptive  Critic  Designs  (ACD)  [2,3],  one  of 
the  most  powerful  tools  for  solving 
multidimensional  optimization  problems,  are 
gaining  growing  popularity  in  research  community 
[4-8,  etc].  It  has  been  demonstrated  that  ACD 
methods  allow  one  to  solve  real-life  problems  with 
large  numbers  of  state  variables  with  better 
performance  than  traditional  methods  allow. 

According  to  the  definition  of  a  "brain-like" 
system,  as  suggested  in  [9,10],  it  should  contain  at 
least  three  major  general-purpose  adaptive 
components:  (1)  an  Action  or  Motor  system, 
capable  of  outputting  effective  control  signals  to 
the  plant  or  environment;  (2)  an  "Emotional"  or 
"Evaluation"  system  or  "Critic,"  used  to  assess  the 
long-term  costs  and  benefits  of  near-term 
alternative  outcomes;  (3)  an  "Expectations"  or 
"System  Identification"  component,  which  serves 
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as  a  model  or  emulator  of  the  external  environment 
or  of  the  plant  to  be  controlled.  Therefore,  3 
separate  but  interacting  subsystems  are  considered: 
Critic,  Model,  and  Action. 

The  implementation  of  these  subsystems  is 
always  a  matter  of  serious  choice.  In  the  most 
general  case,  each  of  them  may  be  implemented  as 
some  kind  of  function  approximator.  Of  course, 
neural  networks  can  be  regarded  as  universal 
approximators  [11,12];  thus,  in  the  majority  of 
applications  the  Model  is  chosen  to  be  neural 
network  with  straight  feedforward  or  recurrent 
architecture.  However,  in  industrial  processes  some 
kind  of  analytical  model  is  usually  available.  So  in 
this  paper  we  use  analytical  models  to  illustrate  the 
basic  concepts  of  design.  It  is  clear  that  such 
models  in  real  applications  might  be  incomplete, 
contradictory  or  ambiguous  (sometimes  they  are 
such  even  in  test  cases),  but  end  users  most  often 
prefer  to  have  first  principles  or  empirical  models 
used  directly  in  the  development  of  an  overall 
control  system,  which  is  quite  reasonable  according 
to  the  concept  of  utilization  of  all  available 
information. 

The  organization  of  the  paper  is  as  follows: 
Section  2  briefly  reviews  the  logic  of  dual  heuristic 
programming  (DHP)  used  here  as  the  example  of 
ACD  design,  and  backpropagation  through  time. 
These  methods  are  applied  in  Section  3  which 
describes  ACD  for  2  traditional  benchmark 
problems:  pole  balancing  on  a  moving  cart,  and 
aircraft  autolander.  Section  4  summarizes  the 
results  of  computer  experiments  with  these 
benchmark  problems,  and  suggests  directions  for 
further  development.  Finally,  the  conclusions  are 
drawn. 


*  The  views  herein  are  those  of  the  author,  not 
those  ofNSF. 
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2.  DHP  AND  BACKPROPAGATION 
THROUGH  TIME  AS  GENERAL 

METHODS. 

A  complete  description  of  backpropagation 
through  time  (BTT)  can  be  found  in  [13],  and  that 
for  various  ACD  including  Dual  Heuristic 
Programming  (DHP)  in  [3].  It  is  worth  mentioning, 
that  the  description  in  this  source  (Handbook  of 
Intelligent  Control)  is  complete,  but  concise,  as  is 
determined  by  the  nature  of  a  Handbook. 
Therefore,  almost  every  subsequent  publication  has 
added  something  from  the  author's  understanding 
of  those  methods.  We  review  here  the  main  ideas 
of  BTT  and  DHP  just  for  the  convenience  of 
readers. 


CRITIC 


R(t+1^ 


X(t+1)=  cU(t+l)/  dR(t+l) 


MODEL 


UTILITY 


ACTION 


R(t) 


Target=^*(t) 


Fig.  1.  An  outline  of  DHP. 

DHP  is  a  procedure  for  adapting  that  network 
which  approximates  the  derivatives  of  the 
secondary,  or  strategic  utility  function  J  (Critic 
network).  The  primary  utility  function  U  is 
supplied  by  the  user.  The  process  of  network 
adaptation  is  governed  by  dual  subroutines  which 
backpropagate  derivatives  from  the  Model  and  the 
Action  as  shown  in  Fig.  1  reproduced  from  [3]  (the 
forward  flow  of  information  is  drawn  in  solid  lines; 
the  backward  flow  is  in  dotted  lines). 

The  Backpropagation  of  utility,  from  the 
other  side,  is  an  exact  and  straightforward  method 
for  adaptation  of  the  Action  weights  based  on 
explicit  calculation  of  the  utility  function  N  steps 
forward  in  time.  Usually  it  is  used  for  off-line 
learning,  but  in  the  test  example  considered  here 


(aircraft  autolander)  it  can  be  implemented  as  on- 
line method,  because  the  control  interval  is  only  10 
times  larger  than  the  sampling  interval,  thus  if  the 
system  is  able  to  calculate  the  utility  function  10 
steps  forward  reasonably  quickly,  it  can  adjust 
weights  in  Action  network  between  two 
consecutive  control  moments. 

3.  COMPUTER  EXPERIMENTS 
WITH  BENCHMARK  PROBLEMS. 

In  both  cases  presented  in  Section  3,  the 
model  of  dynamical  system  is  described  as  a  set  of 
equations,  not  as  a  neural  network  or  other 
approximator.  If  the  model  is  exact,  it  allows  us  to 
avoid  prediction  errors,  and  also  to  get  exact 
derivatives  through  the  model,  which  are  necessary 
to  adapt  the  Critic  and  Action  networks. 

3.1.  Inverted  pendulum  on  a  moving  cart: 
balancing  forever. 

The  pole-balancing  problem  is  one  of  the 
simplest  test  problems  in  classical  control  theory 
[14],  reinforcement  learning  [15],  and  artificial 
intelligence  [16].  The  specific  goals  of  researchers 
have  been  different:  from  training  the  pole-and  cart 
system  so  as  to  keep  balance  with  only  a  fail  signal 
[15],  to  the  demonstration  of  the  optimal  control 
strategy  [17].  Here  we  consider  the  intermediate 
task,  where  learning  of  the  Critic  Network  on-line 
evaluation  of  selected  utility  function,  so  that  the 
resulting  controller  (trained  Action  network)  is  able 
to  balance  the  pole  without  evaluation  of  the 
situation,  using  just  the  vector  of  state  variables. 

For  our  numerical  experiments  we  have  used 
the  equations  from  [15],  and  their  finite  difference 
approximation  with  time  step  equal  to  0.02. 
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The  control  interval  is  equal  to  the  sampling 
interval,  i.e.  force  can  be  applied  each  0.02  sec,  and 
during  this  interval  it  is  constant.  A  description  of 
model  parameters  and  their  default  values  is  given 
in  Table  1. 
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Parameter 

Value 

Comments 

g 

9.81  m/s/s 

acceleration  due  to 
gravity 

1.0  kg 

mass  of  cart 

m 

0.1kg 

mass  of  pole 

1 

0.5  m 

half-pole  length 

0.0005 

coefficient  of 
friction  of  cart  on 
track 

*  y 

0.000002 

coefficient  of 
friction  of  pole  on 
cart 

fit) 

-ION 
ION 

force,  applied  to 
cart's  center  of  mass 
at  time  t 

8 

0.02  sec 

sampling  interval 

Table  1.  Description  and  c 

efault  values  of 

parameters  for  pole  balancing  problem. 

A  principal  difference  from  the  previous 
work  consists  in  the  choice  of  utility  function  U: 

U  =  -/(r)(10sin#  +  sin  5+  xl  25)  -  O.502  -  O.502  -  Q.ix1  -  0.5x2 

In  contrast  with  learning  by  trials  and  errors 
[15],  we  provide  the  possibility  for  the  system  to 
evaluate  its  state  from  the  viewpoint  of  some 
external  observer.  In  this  approach  the  system 
should  not  fail  at  all  if  the  utility  function  is 
defined  correctly,  if  the  system  is  evaluating  its 
state  frequently  to  have  time  to  respond,  and  if  the 
range  of  control  signal  is  wide  enough.  In  fact 
under  such  conditions  a  dynamical  system  guided 
by  DHP  should  demonstrate  one-trial  learning  ~ 
and  it  does. 

With  the  pole  starting  in  an  upright  position 
of  pole,  after  applying  arbitrary  force  (about  2N) 
the  system  actually  converges  to  the  neighborhood 
of  a  stable  state  (zero  angle  from  vertical,  fixed 
coordinate)  after  3000  time  intervals,  i.e.  in  60 
seconds.  That's  what  it  costs  to  learn  the  balancing. 
This  process  is  shown  at  Fig. 2,  where  it  can  be 
seen  how  the  contradictory  requirements  of 
stabilizing  with  respect  to  the  angle  and  to  the 
coordinate  are  satisfied.  The  oscillations  of  utility 
function  certainly  correspond  to  oscillations  of 
force  (control  action)  when  the  system  changes 
priorities:  after  stabilizing  the  angle  it  comes  to 
stabilizing  the  coordinate  of  cart. 
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Fig.  2.  Pole  angle  (a)  and  cart  coordinate  (b)  during 
training  phase.  Time  interval  is  equal  to  0.02  sec. 

In  this  picture  one  can  see  a  periodical 
increase  of  angle  and  coordinate  amplitude  with 
time.  Sounds  strange,  but  this  is  exactly  what  was 
desired.  Not  to  maintain  the  upright  pole  position, 
because  it  is  unrealistic  for  inherently  unstable 
system,  but  to  maintain  proper  frequency  of 
oscillations,  or  combination  of  few  frequencies, 
because  the  system  is  optimizing  several 
parameters  simultaneously  ~  angle,  angular 
velocity,  coordinate  of  cart,  cart  velocity. 

In  training  mode  the  controller  was  balancing 
the  pole  at  least  45  minutes  without  any  growth  of 
oscillations.  So,  the  initial  goal  -  to  keep  balancing 
forever  having  the  possibility  to  evaluate  system 
state  was  reached  in  one  trial. 

Comments  on  choosing  U function. 

A  standard  caveat  in  ACD  description  is  that 
the  utility  function  U  is  essentially  problem 
dependent,  therefore  the  correct  definition  of  U  is 
crucial  for  the  success  of  the  application.  There  is 
also  a  source  of  possible  danger,  however. 
Consider,  for  example,  our  expression  for  the  U 
function  in  the  inverted  pendulum  problem  (pole 
balancing).  It  seems  relatively  simple  and  natural; 
much  more  complicated  expressions  may  be 
derived  when  one  attempts  to  incorporate  human 
experience  [4].  However,  this  is  false  simplicity:  in 
fact,  the  major  term  in  the  utility  function. 
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implicitly  refers  to  a  desired  control  law.  Even 
without  learning,  it  would  be  enough  to  choose  the 
external  force  as 

to  push  the  system  into  the  equilibrium  state  [18]. 

Though  it's  correct  only  when  there  are  no 
constraints  on  x  coordinate  (as  in  the  cited 
reference),  this  example  clearly  demonstrates  that 
the  utility  function  should  reflect  general 
optimization  requirements  like  energy 
minimization,  being  in  the  neighborhood  of  desired 
point,  and  so  on  (as  does  the  utility  function 
implemented  here).  As  can  be  seen  from  data 
analysis,  at  the  beginning  of  training  the  derivative 
of  the  utility  function  with  respect  to  control 
actually  determines  the  direction  of  control 
changes,  then  it  becomes  constant,  and  if  the 
dynamical  system  still  followed  such  instructions  it 
would  move  away  from  the  neighborhood  of 
equilibrium  point.  Starting  from  approximately  50 
sec  of  system  time  the  influence  of  derivatives 
through  the  model  is  more  noticeable,  and  after  that 
moment  those  derivatives  (and,  in  fact,  other 
components  of  utility  function)  determine  the 
changes  of  control  action. 

Generalizing  abilities. 

When  the  system  learned  pole  balancing 
starting  from  the  upright  position,  the  problem  was 
a  bit  complicated.  With  the  same  arbitrary  starting 
control  action  it  had  to  learn  convergence  to  the 
stable  mode  starting  from  a  small  angle.  Using 
Action  and  Critic  networks  trained  for  zero-starting 
balancing,  it  was  possible  to  train  them  further, 
gradually  increasing  the  starting  angle  to  maintain 
one-trial  learning.  At  0.6  rad  training  was  stopped. 
The  Action  network  obtained  at  this  step  was  later 
used  as  the  control  network  for  test  examples.  In 
fact,  trained  Action  network  was  able  to  stabilize 
cart-and-pole  system  with  starting  angle  up  to  0.68 
rad,  which  it  was  not  trained  for. 


(a) 


(b) 

Fig.  3.  Stabilizing  the  system  staring  from  the 
angle  0.6  rad.  Time  interval  is  equal  to  0.02  sec 

The  results  of  a  test  for  starting  angle  0.6  rad 
are  shown  in  Fig.  3.  After  20  sec  of  system  time  the 
pole  angle  reached  0  (actually,  small  oscillations 
around  the  zero  point  were  maintained).  Stabilizing 
of  the  cart  coordinate  took  about  1  min  of  system 
time. 

The  next  question  was:  is  the  controller  truly 
adaptive,  i.e.  what  if  we  change  the  parameters  of 
pole  and  cart  (mass,  length  of  pole)  or  external 
parameters  (friction,  gravity)?  Yes,  it  was  really 
able  to  re-learn;  moreover,  this  adaptation  to  new 
conditions  occurred  instantly  (Fig.4). 
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Fig.  4.  Cart  coordinate  for  training  of  the  system 
with  variable  parameters.  Time  interval  is  equal  to 
0.02  sec 

Learning  was  started  from  the  standard 
parameter  set  defined  in  Table  1.  After  2  min  of 
system  time  the  mass  of  the  pole  was  changed  to 
0.3  kg,  and  2  min  later  it  was  changed  to  0.05  kg 
without  interrupting  of  learning.  As  can  be  seen  at 
the  chart  for  cart  coordinate,  such  changes  (point 
6000  and  12000)  were  hardly  noticeable.  Again 
after  2  min  of  system  time  the  length  of  the  pole 
was  changed  to  3  m.  and  the  mass  of  the  pole  was 
set  back  to  standard  mass  0. 1  kg.  This  time  it  was 
noticed  (point  18000),  and  it  coincides  with  human 
experience  ~  the  easiness  of  balancing  depends 
more  on  pole  length,  than  on  its  mass.  At  the  point 
24000  (next  2  minutes  of  system  time)  all 
parameters  were  reset  to  standard  one;  however 
gravity  was  decreased  10  times.  The  general 
structure  of  learning  process  remained  the  same, 
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but  the  frequency  of  oscillations  decreased,  which 
was  entirely  expected  according  to  physical  laws. 
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Fig.  5.  Action  strategy  for  larger  control  interval. 
Time  interval  is  equal  to  0.05  sec. 

The  last  computer  experiment  that  can  be 
done  with  our  controller  is  to  put  it  into  really  hard 
conditions.  It  was  trained  for  a  control  interval  0.02 
sec;  let  us  now  assume  that  it  is  possible  to  apply 
control  actions  only  every  0.05  sec.  It  can  be  seen 
from  Fig.  5  that  even  with  such  a  restriction,  the 
Action  network  provides  reasonable  control 
without  retraining,  though  the  control  strategy  is 
quite  different:  instead  of  small  oscillation  around 
zero  angle  the  system  maintains  a  stable  mode  with 
large  amplitude  of  oscillations. 

Comparing  our  results  with  those  of 
Balakrishnan  [17]  we  can  conclude  that  the 
resulting  controller  shows  behavior  close  to  the 
optimal  control.  It  can  be  seen  that  choice  of 
different  utility  function  does  not  influence  the 
nature  of  the  stand-alone  controller.  The  utility 
function  is  important  only  during  the  training  stage; 
the  resulting  Action  network  should  be  close  to 
optimal  control  law,  whichever  was  the  learning 
method.  Thus,  the  most  general  type  of  utility 
function,  derived,  for  example,  from  energy 
conserving  conditions  (minimal  intensity  of 
control)  or  desired  set  point  description  (minimal 
square  of  all  state  variables)  should  be  preferred 
over  the  fancy  "problem-dependent"  functions. 

Shaping:  an  easy  way  to  train  the  controller. 

Now  we  come  to  the  exploitation  of  scale- 
invariance  of  the  model.  Since  the  system  should 


stabilize  both  pole  angle  and  cart  coordinate,  the 
simplest  utility  function  is 

u= -Q.se1  -o.5(X/Xmmy 

where  coefficients  are  chosen  so  as  to  simplify  the 
derivatives  of  utility  function. 

However,  it  is  very  difficult  to  train  the 
Action  and  Critic  networks  with  such  a  general 
utility  function.  With  high  learning  rate  the  force 
oscillates,  but  with  low  learning  rate  it  is  to  slow  to 
react  properly.  Note  that  now  the  system  should 
learn  to  react  only  to  target  variables,  not  to  their 
derivatives. 

One  technique,  that  is  very  efficient  in  such 
situations  is  called  "shaping"  in  [13].  In  shaping, 
one  first  adapts  a  simpler  controller  to  a  simplified 
version  of  the  problem,  one  then  uses  the  weights 
of  the  resulting  controller  as  the  initial  values  of  the 
weights  of  the  controller  to  solve  the  more  complex 
problem,  and  so  on  up  to  the  required  level  of 
complexity. 

It  is  clear  which  is  the  critical  parameter  that 
determines  the  complexity  of  pole-and-cart  control: 
gravity.  So,  the  training  was  started  from 
g=0.0981m/sec2;  then,  after  convergence  (about  60 
sec  of  system  time)  this  parameter  was  set  to 
g=0.981  m/sec2.  This  time  training  took  a  bit 
longer,  but  still  after  5  trials  the  system  learned  to 
keep  its  balance  starting  from  zero  position  and 
random  (about  2N)  initial  force,  as  is  shown  in 
Fig.6. 
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Fig.6.  Training  phase  for  g=0.981m/sec2,  general 
type  of  utility  function:  angle  (a),  x-coordinate  (b). 
Time  interval  equal  0.01. 
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After  that  the  system  was  able  to  stabilize  the 
pole  with  starting  angle  0.5  rad  without  re-learning, 
with  the  same  high  learning  rates  of  Action  and 
Critic  networks.  Before,  with  a  problem-dependent 
utility  function  an  increase  of  starting  angle  from 
0.1  to  0.6  rad  required  a  decrease  of  learning  rate 
from  0.7  to  0.1. 

The  essential  point  in  the  training  process, 
regardless  of  the  utility  function,  (and  mostly 
regardless  of  the  concrete  example)  was  that  there 
be  a  difference  between  "convergence  for  training 
with  possibility  of  weights  correction"  and 
"convergence  for  stand-alone  action".  In  the  first 
meaning,  the  system  Action+Critic  converged  after 
at  most  12000  iterations  (120  sec  of  system  time, 
see  Fig.6).  However,  Action  alone  has  been 
converged,  in  the  second  meaning,  only  30000 
iterations  later.  Moreover,  the  controller  for 
g=0.981  m/sec2  was  able  to  control  pole-and  cart 
system  up  to  the  value  of  g=4.905  m/sec2. 

Then  the  same  shaping  procedure  was 
repeated  for  the  standard  value  of  gravity.  After 
few  trials  it  learned  new  external  conditions,  and  at 
the  end  again  Action  network  was  able  to  control 
pole-and-cart  system  autonomously.  For  the  last 
case  the  utility  function  was  slightly  changed  to 
gave  more  importance  to  the  changes  in  x- 
coordinate. 

None  of  the  ACD  pole  balancer  descriptions 
cited  here  (except,  possibly  [8],  which 
implementation  we  had  not  possibility  to  evaluate 
in  details  yet)  was  able  to  learn  a  stable  stand-alone 
controller  working  with  simple  general-type  utility 
function  under  the  conditions  of  full  gravity.  Like 
Balakrishnan  [17]  we  were  able  to  receive  optimal 
control  strategy  under  reduced  gravity  (g=0.981), 
however,  then  we  scaled  it  up  to  real-life 
conditions.  Shaping,  as  learning  tactic  was 
extremely  useful  and  time-efficient  for  this 
purpose. 

3.2.  AIRCRAFT  AUTOLANDER:  THE 
SIMPLEST  DESIGN  IS  ENOUGH. 

The  Autolander  problem  described  in  [1] 
concerns  the  simulation  of  aircraft  landing.  This 
linearized  model  of  commercial  aircraft  and  its 
environment  includes  such  variables  as  longitudial 
velocity,  vertical  angular  velocity,  pitch  rate  and 
angle,  altitude,  horizontal  position,  autothrottle 
state,  and  wind  disturbance  state.  Pitch  angle 
command  is  used  as  control  action.  The  equations, 
parameters,  and  constraints  are  those  from  [1], 
problem  A.  3. 

The  process  of  landing  is  guided  by  an 
airport-based  Instrumental  Landing  System  (ILS), 


which  provides  desired  values  of  height  and 
vertical  velocity  of  aircraft,  and  in  this  example 
controller  must  track  the  prespecified  trajectory.  In 
fact,  it  must  learn  the  proper  approach  angle 
relative  to  the  runway  threshold,  which  should  be 
maintained  during  the  first  phase  of  landing. 
During  the  second  phase,  starting  at  45  feet  height, 
the  aircraft  follows  the  exponential  law,  decreasing 
both  altitude  and  vertical  velocity,  and  elevating 
the  nose. 

As  was  said  before,  almost  every  model 
allows  some  undesirable  freedom  at  the 
implementation  stage,  which  can  be  treated  in 
different  ways.  First,  we  re-defined  the 
intermediate  variable  aw=U0/h(t),  where  h(t)  is  the 
height  of  aircraft  at  the  moment  t,  to  be  aw=U0/10 
for  h<10  to  avoid  division  by  zero.  The  height 
h=10  was  rather  arbitrary.  Another  modification 
was  not  so  small,  but  it  seemed  to  be  reasonable  in 
the  modeling  context:  during  the  flare  phase  in  the 
original  model  ILS  provided  the  desired  values  for 
height  and  vertical  velocity  starting  from  actual 
height  of  aircraft  (45  feet)  at  the  beginning  of  the 
flare  regardless  of  what  the  desired  height  was  just 
before  this  moment. 


(b) 

Fig.  7.  Broken  desired  trajectory  for  the  original 
model  (a),  and  conunuos  one  for  modified  model 
(b). 

As  a  result,  the  desired  trajectory  had  a  break 
at  h=45  ft,  and  the  process  of  learning  during  the 
flare  phase  was  biased.  The  difference  between 
original  (a)  and  implemented  here  (b)  desired 
trajectory  is  shown  in  the  Fig.  7. 
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The  last  modification  was  related  to  starting 
conditions:  the  desired  interception  point  calculated 
from  ILS  data  was  at  the  end  of  the  runway,  so 
instead  of  using  real  coordinate  x  to  calculate  the 
desired  (ILS)  height  we  used  x=x+500. 

Under  these  conditions  the  utility  function 
can  be  written  as 

U  =  -a, 0.5(h  -  h J2  -  a2 0.5(/i  -  hc  f 

where  h{h)  is  actual  height  (vertical  speed)  of  the 
aircraft,  and  hc{hc)  is  the  height  (vertical  speed) 

value  supplied  by  ILS.  Coefficients  ai  and  a2  varied 
during  the  training  phase:  first,  ai  was  much  greater 
than  a2,  reflecting  the  priority  of  landing  in  the 
target  zone  of  runway;  then  a2  was  increased  to 
provide  fine  tuning  of  vertical  speed. 

The  state  vector  comprised  all  aircraft  and 
environment  variables,  thus  allowing  us  to  avoid 
including  the  information  from  previous  time 
intervals  as  additional  input  to  Action  and  Critic 
networks.  Besides,  using  an  analytical  model  we 
can  calculate  exact  derivatives  with  respect  to 
control  and  input  variables,  and  the  accuracy  of 
simulation  and  evaluation  of  state  essentially 
increases,  thus  decreasing  the  number  of  iterations 
necessary  for  convergence. 

How  it  was  really  going 

Training  of  the  system  was  done  with 
backpropagation  of  utility  [13]  with  calculation  of 
utility  function  10  steps  forward.  Taking  into 
account  that  the  sampling  interval  was  0.01  sec, 
and  control  interval  0.1  sec,  it  was  almost  real-time 
learning.  We  include  here  some  comments  to 
illustrate  the  concrete  organization  of  this  process. 
First,  input  vector  normalization  of  at  least  some 
components  of  the  input  vector  was  necessary 
(division  of  height  (h)  and  horizontal  coordinate  (x) 
by  maximal  values  of  these  parameters,  known  in 
advance). 

Next,  for  this  kind  of  modeling  it  is  important 
to  simulate  the  noise  vector  outside  the  Model 
itself,  because  when  we  use  dual  subroutine 
DualModel  for  calculating  the  derivatives  with 
respect  to  control  and  state  vectors  at  the  backward 
step,  we  need  the  same  values  of  noise  parameters 
as  were  used  during  the  forward  step,  so  it  should 
be  either  the  same  input  vector  for  Model  and 
DualModel  or  an  external  vectors  available  for 
both.  Another  important  point  is  already 
commonplace:  don't  use  the  standard  C  random 
number  generator,  which  has  extremely  bad 
statistical  properties. 


Wind  shear,  gusts,  and  generalization 
abilities 

Initially  the  system  was  trained  in  the  absence 
of  wind  gusts,  with  wind  shear  only.  Learning  was 
fast,  within  10  attempts  the  difference  between 
actual  and  desired  trajectory  decreased 
significantly,  and  the  Action  network  was  able  to 
land  the  aircraft  starting  from  default  conditions 
subject  to  constraints  on  touchdown  region,  vertical 
speed,  horizontal  speed  and  pitch  value. 

Then  wind  gusts  were  added,  and  learning 
was  continued.  Still  the  Action  network  was  able  to 
land  the  aircraft  if  control  signals  were  generated 
using  the  information  about  utility  values  10  steps 
forward.  What  really  took  time  here,  was  the 
training  of  the  Action  network  to  generate  control 
signals  responding  only  to  the  state  vector,  without 
dynamic  weights  correction.  After  a  few  hundred 
iterations  it  was  able  to  land  the  aircraft  in  wind 
conditions  in  the  majority  of  cases. 

As  can  be  seen  from  Fig.  8(a),  representing 
test  results  for  200  landings  with  random  noise, 
most  of  the  landings  ended  in  the  zone  called 
"shortened  runway"  in  [5]  (below  600  ft.  The 
distribution  of  vertical  velocity  is  within  a  narrow 
range,  though  it  doesn't  satisfy  the  constraints  of 
original  model. 
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Fig.  8.  Test  results  for  landings  with  random  wind 
gusts:  x-coordinate  (a),  and  vertical  speed  (b). 

We  can  conclude  that  with  a  full  and  exact 
analytical  model  of  aircraft  it  is  possible  to  achieve 
a  high  rate  of  success  even  without  guidance  from 
an  Instrumental  Landing  System.  What's  more 
important  here,  this  rate  has  been  achieved  with 
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simplest  system  design  possible  -  a  feedforward 
Action  network  and  backpropagation  of  utility  as 
learning  method.  This  tells  us  that  the  autolander 
problem  in  this  formulation  hardly  can  be 
considered  as  suitable  example  for  demonstrating 
the  advantages  of  more  sophisticated  designs. 
DHP,  for  example,  demonstrates  approximately  the 
same  success  rate,  though  with  different 
distribution  of  x-coordinate  and  vertical  speed; 
however  DHP  training  took  much  more  time  for 
this  case. 

For  examples  of  successful  application  of 
HDP,  BU,  and  DHP  to  this  problem  one  may  see 
[5,7].  A  brief  description  of  successful  application 
of  GDHP  can  be  found  in  [6].  A  lot  of  model 
details  are  discussed  in  [19]  where  the  problem  was 
originally  posed. 

4.  FIRST  OBSERVATIONS  AND 
FURTHER  DEVELOPMENT 

Though  the  examples  described  here  are 
simply  test  problems,  they  allow  us  to  make  some 
informal  observations  that  may  be  helpful  for  real- 
life  applications.  First,  these  problems  are  not 
complex  enough  to  demonstrate  the  advantages  of 
more  sophisticated  designs  over  simpler  ones,  like 
BTT  .  Next,  they  allow  multiple  final  formalization 
that  leads  to  different  quality  of  models  thus 
reducing  the  compatibility  of  results.  The  last 
technical  remark:  usage  of  de  facto  standard  test 
problems  should  be  more  formal  than  we  can  see 
now.  None  of  the  papers  cited  here  deals  with 
exactly  the  same  problem  that  any  other  paper 
deals  with.  There  are  always  small  differences  — 
either  the  dimensions  and  components  of  input 
vector,  or  different  values  of  coefficients,  or 
slightly  modified  model.  This  complicates  real 
comparison  and  reproducibility  of  results. 

The  same  problems  were  recognized  long  ago 
in  supervised  learning,  and  the  benchmark  set 
PROBEN1  appeared  [20].  In  summary,  there  is  still 
a  need  for  the  creation  of  more  challenging  and 
more  fully  specified  benchmark  problems  and 
benchmarking  rules  for  neurocontrol,  which 
(ideally)  should  be  disseminated  over  the  Web  and 
addressed  in  "challenge"  sessions  in  major 
conferences. 

It  is  important  to  distinguish  between  the 
complexity  of  ACD  methods  and  the  complexity  of 
the  problem  in  hand.  The  application  of  these 
methods  can  be  straightforward,  but  the  problem 
may  be  underdetermined  or  contradictory.  The 
final  formalization  of  a  model  may  be  done  in 
different  ways;  the  details  of  its  implementation  are 
also    essential.    Even   the    simplest  examples 


described  in  this  paper  demonstrate  these 
possibilities.  Fortunately,  ACD  designs  are 
powerful  enough,  and  sometimes  they  can  handle 
ill-defined  problems,  but  this  is  not  always  the 
case.  It  is  necessary,  for  instance,  to  provide  proper 
normalization,  and  the  same  range  of  calculated 
and  desired  values  of  derivatives  of  utility  function 
before  final  judgement  about  modeling  quality.  The 
optimal  choice  of  parameters,  like  learning  rate  or 
network  architectures  is  not  an  ACD -specific 
problem.  It's  just  an  inner  problem  of 
implementation  modules  that  constitute  the  whole 
system,  and  relationships  between  parameters  of 
different  modules  like  those  mentioned  in  [8]  are 
most  often  the  phantoms  of  more  serious  problems 
inside  the  modules. 

If  we  consider  the  "ladder  of  designs", 
defined  first  in  [2]  then  in  greater  detail  in  [21],  an 
example  clearly  demonstrating  the  advantages  of 
higher-level  designs  over  lower-level  ones  should 
be  something  like  this: 

—  self-complete,  not  allowing  to  extend  it  using 
commonsense  or  physical  knowledge; 

—  formalized  up  to  the  end,  including  input-output 
parameters,  integration  scheme,  list  of  adjusted 
parameters,  normalization  procedures; 

~  showing  visible  increase  of  some  quality 
measure  while  climbing  on  the  ladder  of  designs; 
Besides,  it  should  be  physically  interesting.  To 
develop  such  an  example  will  be  our  next  task. 

Another  direction  of  further  research  is  to 
clarify  some  stability  issues.  As  could  be  seen  from 
more  detailed  analysis  of  the  inverted  pendulum 
example,  if  the  system  is  inherently  unstable  the 
increase  in  calculated  Critic  value  leads  to  greater 
increase  in  target  Critic  value,  and  the  system 
quickly  moves  to  or  behind  the  accepted  range  of 
parameters.  This  can  be  avoided  by  programming 
tricks,  like  dynamic  renormalization,  but  it  would 
be  preferable  and  more  reliable  is  to  solve  such 
divergence  problems  at  conceptual  level. 

5.  CONCLUSIONS 

When  the  problem  at  hand  is  solved,  at  least 
to  the  degree  acceptable  to  potential  users,  it  seems 
so  easy  that  even  the  developer  forgets  the  tricks  he 
applied  and  the  traps  he  avoided.  Yes,  his  abilities 
as  a  developer  have  increased,  and  at  the 
subconscious  level  he  remembers  the  way  to  do 
things  right,  but  here  the  question  of  expertise 
transfer  arises.  If  something  has  been  done,  it 
would  be  useful  to  formalize  a  method  of  problem 
solving,  even  if  it  is  not  the  way  the  problem  was 
actually  solved.  First,  it  makes  it  possible  to 
understand  even  one's  own  results  more  deeply. 
Next,    such    "step-by-step"    instructions  allow 
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newcomers  to  the  field  to  get  their  first  results 
easier,  thus  creating  positive  motivation,  which  in 
turn  stimulates  their  productivity.  Also,  it  provides 
guidelines  for  future  research.  And  finally,  the 
most  ambitious  though  long-term  claim:  such 
formalization  of  human  experience  leads  to  the 
possibility  of  automating  some  routine  procedures 
and  tactical  decisions  now  reinvented  again  and 
again. 
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Abstract 

A  neural  network  based  planning/ learning  system 
that  prunes  its  decision  tree  by  generalizing  upon  re- 
sults of  previous  experiences  is  proposed.  The  results 
for  the  case  of  and  inverted  pendulum  controller  are 
demonstrated  [1]. 

I.  Introduction 

Neural  Networks  (NN)  for  intelligent  control  as  a 
part  of  well  known  structures  with  "Adaptive  Critic" 
are  recommended  [2,  3].  However,  the  advantages  of 
search  are  not  fully  used  because  the  NN  elements  do 
not  have  any  capability  for  searching. 

Real-time  Control  System  (RCS)  architecture  of  in- 
telligent systems  [4]  determines  the  functionality  of 
Behavior  Generation.  The  latter  uses  planning  as  a 
combination  of  job  assignment  and  scheduling  [4,  5] 
which  determines  the  need  in  a  subset  selection  (focus- 
ing attention)  and  alternatives  exploration  (search- 
ing). Complexity  of  doing  this  can  be  substantially 
reduced  by  forming  a  multi-resolutional  system  [6] . 

In  this  paper,  we  will  follow  the  conceptual 
paradigm  of  multi-resolutional  planning/learning  out- 
lined in  [7,  8].  We  will  build  upon  the  learning  algo- 
rithms used  in  Baby-Robot,  [9,  10]  but  will  implement 
them  with  the  planning  algorithm. 

Planning/learning  as  a  joint  process  relies  upon 
simulation  of  multiple  alternatives  of  the  process, 
Construction  of  decision  trees  is  necessary  in  many  ar- 
eas where  reactive  strategies  are  insufficient  to  achieve 
the  desired  performance  (for  example,  manufacturing, 
obstacle  avoidance,  control  systems,  medical  interven- 
tions ,  etc).  Most  of  these  planning  problems  are 
solved  in  one  of  the  following  ways: 

•  Search  techniques.  These  include  Dijkstra,  A*, 
beam  search  and  most  of  the  AI  techniques  like 
(genetic  algorithms  and  expert  systems).  These 
solutions  are  considered  to  be  too  slow  for  real 
time  operations  and  are  in  most  cases  nonlearn- 
ing  search  techniques. 


•  Dispatching  techniques.  Frequently  the  design- 
ers decide  that  planning  is  altogether  too  expen- 
sive. Instead,  they  supply  several  general  rules 
of  thumb  that  are  supposed  to  reactively  im- 
prove performance.  Unfortunately,  these  rules 
of  thumb  are  very  general  (FIFO,  LIFO,  bid- 
ding) and  usually  under-perform  most  search  al- 
gorithms. The  dispatching  techniques  are  equiv- 
alent to  "greed"  algorithms  and  generally,  cannot 
ensure  the  optimality  of  solutions. 

The  proposed  NN-based  Planner /learner  has  per- 
formance advantages  over  the  dispatching  algorithms 
with  adaptation.  It  plans  (instead  of  reacting),  and  it 
learns  rules  (or  rules  of  thumb)  that  allow  it  to  search 
only  a  reduced  set  of  the  decisions  in  the  decision  tree. 

The  proposed  NN  Planner /learner  learns  rules  of 
actions  for  different  lengths  of  strings  of  experiences, 
and  it  learns  to  simulate  the  results  of  this  actions. 
Then,  the  NN-planner  creates  a  look  ahead  search 
where  the  different  action  rules  are  explored,  and  the 
simulator  predicts  new  states.  This  tree  is  explored 
before  any  control  is  applied. 

Before  giving  the  details  of  the  NN  planner /learner, 
we  present  its  building  blocks  (NNs)  and  how  are  they 
trained  to  learn  both  the  action  rules  and  the  simula- 
tion rules. 

II.  The  Neural  Network  Algorithm 

In  order  to  develop  the  NN  based  planner /learner. 
Standard  NNs  were  applied.  The  training  algorithm 
uses  the  Levenberg-Marquardt  variation  of  Newton's 
method  to  modify  the  NNs  weights.  The  algorithm 
approximates  the  Hessian  matrix: 

V2F(x)=2JT(x)J(x)  (1) 

where  J(x)  is  the  Jacobian  matrix.  And  Newton 
method  can  be  written  as: 

xfe+1  =  xfe  -  Afc_1gfc  (2) 
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for  (la«l;la<=iterations  ;la++){ 

ext_al  ■  nncpyi(al,w3.rows) ; 
ext_a2  =  nncpyi(a2,w3.rows) ; 
d3  ■  funcPrime(func3,a3) ; 
ext_d3  ■  nncpyd(d3)  .negO  ; 

«xt.d2  ■  f  uncPriae(func2  ,w3.T()  ,ext_a2,e:tt_d3)  ; 
ext.dl  ■  funcPrime(funcl,w2.T() ,ext_al ,ext_d2) ; 
ext_p  «  nncpyi(inp.TC) ,w3 .rovs) ; 

jl  ■  nncpy(ext_dl  .TO  ,  ext.p.rows) .dotHul(nncpyi(ext_p.T() ,  ext.dl .rows) ) ; 
j2  *  nncpy(ext_d2.T() ,  ext.al .rows) .dotKul (nncpyi (ext.al .TO ,  ext_d2 .rows) ) ; 
j3  =  nncpy(ext_d3.T()  ,  ext_a2 . rows) .dotHul Cnncpyi (ext_a2 -TO ,  ext_d3 . rows) ) ; 
jac  =  new  MatrixCjl,  ext.dl. TO,  j2,  ext_d2.TO   ,  j3,  axt_d3.T()); 
je  =  jac. TO  .aultiply(e.TO)  ; 
grad  =  je .  normO  ; 

if  (grad  <  grad.ain)  {la*la-l;  break;} 

//  INNER  LOOP,   INCREASE  HU  UNTIL  THE  ERRORS  ARE  REDUCED 
jj  »  jac. TO  .multiply(jac)  ; 
while  (bu  <*  Bu_max){ 

dx  »  caldx(j j , je.nu) ;  //calculates  dx 

int  aui count =0. 

transfer_dxs_to_dbs_and_dws(dx) ; 

new.wl  *  wl.addCdwl); 

new_w2  =  w2.add(dv2); 

new_w3  =  w3.add(dw3); 

new_bl  ■  bl.addCdbl) ; 

new_b2  ■  b2.add(db2); 

new_b3  =  b3.add(db3); 

a=feedForward(inp, new.wl ,new_bl ,new_w2.new_b2,new_w3 ,new_b3) ; 
new_e  ■  out  .TO  ■  sub(a)  ; 

for  (k-0;k<new_e.cols;k++){    \calculates  the  new  error 
new.error  +■  new.e .data[0] [k] *new_e .data[0] [k] ; 

> 

if  (new.error  <  error)  break; 

mu°  mu*BU_inc;     //changes  mu 

} 

if  (nu  >  mu.naxH 
la  ■  la-lj 
break; 

} 

du  ■  mu  *  mu_dec; 

//  UPDATE  NETWORK 

wl  =  new.wl;  bl  »  new.bl; 

w2  =  new_w2;  b2  ■  new_b2; 

w3  =  new_w3;  b3  *  new.b3; 

e  *  new_e ;  error  *  new.error; 

System. out. println("Average  error  after  "♦ 

la*"  iterations  ■  "+  error*  "  bu""+bu  ); 
if  (error  <  press)  la  ■  iterations  +  10; 

} 

} 

Figure  1:  Levemberg-Marquardt  JAVA  implementa- 
tion 

where  A,  =  V2F(x)|x=Xfc  and  gk  =  VF(x)|x=Xfc. 
Thus,  by  plugging  in  we  get, 

xfe+i  =xfc  -  [JT(xfc)J(xfc)]_1  JT(xfe)v(xfc)  (3) 

the  problem  with  the  previous  equation  is  that 
JT(xfc)J(xfc)  may  not  be  invertible.  Thus,  Levenberg- 
Marquardt  algorithm  warrants  invertibility, 

Xfc+i  =  Xfc  -  [JT(xfc)J(xA:)  +  fj,kl]  1  JT(xfc)v(xfc) 

(4) 

by  looking  at  the  previous  equation,  we  see  that  if 
fik  is  large,  then  the  algorithm  approaches  the  steep- 
est descent  algorithm.  If  it  is  0,  it  becomes  the  Gauss- 
Newton  algorithm.  In  our  implementation,  different  \i 
are  tested,  and  the  best  one  is  selected  at  each  epoch. 
The  core  of  the  Java  implementation  shows  how  this 
was  implemented  is  shown  in  Figure  1. 

The  complete  algorithm  was  successfully  tested  us- 
ing the  standard  problems  (XOR,  etc.).  As  far  as 
transfer  functions  the  log  sigmoid  a  =  1+*_„  and  the 

hyperbolic  tangent  sigmoid  a  -  were  imple- 

mented. The  hyperbolic  tangent  was  used  in  the  ex- 
amples for  this  paper. 


reset  levelcounter 

Assign  the  starting  node  to  the  list  (level  0) 
Do  { 

Find  a  node  with  level  equal  to  levelcounter 
if  found  a  node  { 

Rebuild  the  schedule  for  this  node 
backtracking  though  the  list 
if  (schedule  is  not  complete)  { 

Create  a  new  node  for  each  possible 
decision  assigning  as  cost  the  simulated 
results  and  marking  the  level. 
Put  them  in  the  list 

} 

}  else  increment  level 
}  while  (this  level  has  a  complete  schedule  and  all  the 
nodes  of  the  level  are  open) 


Figure  2:  Pseudocode  for  scheduling  using  dynamic 
programming  principle 

III.  The  Planning  Algorithm 

This  paper  will  go  over  the  two  most  known  plan- 
ning algorithms.  Most  of  the  other  algorithms  are 
derived  from  these  two,  and  they  normally  lose  gen- 
erality by  doing  so. 

III. A.  Dynamic  Programming 

The  fundamentals  of  dynamic  programming  were 
introduced  in  the  1950's  by  Bellman  [11,  12]  and  thor- 
oughly explained  in  [13,  14].  Dynamic  Programming 
interprets  any  optimization  as  a  multi-stage  decision 
process.  The  results  of  dynamic  programming  can  be- 
come known  locally  only  after  the  total  search  for  the 
best  trajectory  is  completed. 

To  calculate  the  optimal  criterion  value  for  any  sub- 
set of  size  k,  we  first  have  to  know  the  optimal  value 
for  each  subset  k  —  1.  In  this  case,  k  represents  the 
level  in  the  tree  of  search.  Figure  2  shows  the  pseu- 
docode dynamic  programming  process. 

Dynamic  programming  originally  was  applied  in 
cases  where  the  branches  of  the  tree  cost  the  same. 
The  space  was  intentionally  discretized  such  that 
moving  to  successors  would  generate  exactly  one  unit 
of  cost.  In  these  cases,  dynamic  programming  opens 
exactly  the  same  amounts  of  nodes  as  Dijkstra  Algo- 
rithm (next).  In  cases  when  the  cost  of  the  edges  of 
the  graph  are  not  the  same,  Dynamic  programming 
open  significantly  more  nodes  than  Dijkstra.  No  mat- 
ter which  form  is  chosen,  dynamic  programming  re- 
quires completion  of  the  search  to  the  goal  . 

III.B.  Dijkstra  Search 

The  average  computational  requirements  are  con- 
siderably less  than  those  of  the  dynamic  programming 
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Assign  the  starting  node  to  the  open  list 
Do  { 

Find  the  node  with  the  smallest  cost  in  the  open 

list  and  move  it  to  the  closed  list 
Rebuild  the  schedule  for  this  node  backtracking 

through  closed  list 
if  (schedule  is  not  complete)  { 

Create  a  new  node  for  each  possible  decision 
assigning  as  cost  the  simulated  results. 

Put  them  in  the  open  list 

} 

}  while  (schedule  is  not  complete) 


Figure  3:  Pseudocode  for  scheduling  using  Dijkstra 
algorithm 


[15].  A  pseudocode  for  Dijkstra  algorithm  applied  to 
scheduling  is  presented  in  Figure  3.  The  Dijkstra  al- 
gorithm, in  the  same  manner  as  dynamic  program- 
ming warrants  that  the  solution  is  optimal.  The  dif- 
ference between  dynamic  programming  and  Dijkstra 
algorithm  is  that  dynamic  programming  iterates  in 
the  number  of  arcs  in  the  graph  while  Dijkstra  iter- 
ates on  the  length  (or  cost)  of  the  arcs.  An  interesting 
comparison  between  Dijkstra,  dynamic  programming 
and  Floyd- Warshall  algorithm  is  presented  in  [15]. 

7/7.  C.  Using  Neural  Networks  to  Prune  the  Dijkstra 
Tree 

We  can  use  NNs  within  the  Dijkstra  Planner  in  two 
possible  ways: 

1.  As  simulators.  NNs  can  be  trained  to  predict 
what  will  be  the  next  state  given  the  current  state 
and  the  decision.  Of  course  there  are  some  cases 
where  the  plant  complexity  itself  will  make  this 
mapping  hard  to  train.  In  these  cases  (which  may 
be  most)  the  system  will  have  to  be  divided  in  dif- 
ferent hierarchical  levels  to  handle  its  complexity. 
This  issue  is  outside  of  the  scope  of  this  paper. 
For  the  examples  given,  one  level  is  sufficient,  and 
this  mapping  can  be  learned  with  sufficient  accu- 
racy to  make  the  planning  plausible. 

2.  As  control  rules.  If  information  of  the  previous 
working  of  a  controller  for  the  plant  or  results  of 
previous  search  with  the  same  goal,  this  informa- 
tion can  be  used  to  train  NNs  to  suggest  patterns 
of  decisions  that  where  successful  in  the  past.  We 
propose  to  use  abduction:  search  the  tree  of  fu- 
ture decisions  by  exploring  first  paths  that  were 
successful  in  the  past.  Of  course,  this  technique 
will  not  give  better  results  than  the  full  search, 
but  there  are  many  cases  where  full  search  is  not 
an  option. 


IV.  An  Example 

For  this  example  we  have  chosen  the  inverted  pole. 
The  dynamical  equations  are  as  follows: 

x2  (5) 

(^+g.sin(*l)  +  ^) 
1.(^  +  1 -(cosM)*.^)        1  ] 

(U  —  Hr  *  sign{xi)  —  m.l.x\  *  sin(:ri)) 


x2 


'■(l=dar  +  i-<«»(*i))a-]ifc) 


,  U  —  nr.sign{xi) 

xA    =  - 


+ 


M  +  m 

m.l.(x'2.  cos(a:i)  -  x\  *  sin(a;i)) 
M  +  m 


(7) 
(8) 


where 


m  =  1  Kg 
I  =  0.5  m 

j    m.l2 

1  ~  3 


mass  of  pendulum 
pivot  of  center  of  mass 
inertia 

pivot  friction  [ip  =  2  *  1CT6  *flpf 
mass  of  cart  M  =  1  kg 

rolling  friction  \iT  =  5  *  1(T4  ^p. 


gravity         g  =  9.81-y 


m 


In  order  to  simulate  this  equations,  Runge-Kutta  was 
implemented  in  JAVA  (see  Figure  4). 
The  JAVA  applet  is  shown  in  Figure  5 

V.  Learning  the  Control  Rules 
V.A.  The  Standard  Way 

The  first  step  was  to  test  the  NN  algorithm  in  the 
pole  simulation  using  the  standard  technique  of  learn- 
ing the  input-output  mapping.  The  following  experi- 
ment was  performed: 

1.  A  controller  was  built  using  the  following  param- 
eters: 

U  =    (random(l)  *  2  —  1)  *  error 
+400  *  (x3  —  goalpos) 
+400  *  xA 
-1089.172  *xi 
-401.27*  x2 

To  make  it  a  little  bit  more  realistic,  a  maximum 
force  of  U  was  not  allowed  to  get  out  of  the  inter- 
val [—300,300].  An  error  is  introduced  as  func- 
tion of  error.  The  position  of  the  goal  goalpos  is 
randomly  placed  at  random  intervals. 

2.  A  string  of  1000  states  and  actions  were  recorded 
to  simulate  the  normal  working  of  the  plant  with 
some  error. 
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public  void  rk4 (double []  y,  doubled  dydx,  int  n, 

double  x,  double  h,  double []  yout){ 

int  i ; 

double  xh,hh,h6; 

double []  dym=new  double [n+1] ; 

double []  dyt=new  double [n+1]; 

double []  yt=new  double [n+1]; 

hh=h*0 . 5 ; 

h6=h/6 . 0 ; 

xh=x+hh ; 

for  (i=l;i<=n;i++)  yt[i]=y[i]+hh*dydx[i]  ; 
derivs (xh.yt , dyt) ; 

for  (i=l;i<=n;i++)  yt [i] =y [i] +hh*dyt [i] ; 
derivs (xh.yt ,dym) ; 
for  (i=l;i<=n;i++)  { 

yt[i]=y[i]+h*dym[i]  ; 

dym[i]  +=  dyt  [i] ; 

} 

derivs (x+h,yt , dyt) ; 
for  (i=l;i<=n;i++) 

yout[i]=y[i]+h6* 

(dydx [i] +dyt [i] +2 . 0*dym [i] ) ; 

yt=null; 
dyt=null ; 
dym=null ; 

} 


Figure  4:  Runge-Kutta  Java  implementation 


\ 

Applet  started. 

A 

Figure  5:  Pole. class  applet  of  the  inverted  pole 

A  NN  was  trained  by  using  the  chosen  control  algo- 
rithm to  learn  the  U  given  the  states.  The  structure 
of  the  net  was  a  6-6-1  net  using  the  tangential  loga- 
rithmic activation  function.  Figure  6  shows  the  error 
function  of  the  network  while  training. 

Figure  7  shows  the  error  of  the  plant  operating  with 
the  trained  NN  and  the  error  of  the  plant  operating 
with  the  original  controller.  This  figure  was  created 
by  accumulating  the  er  =  abs(xi  —  goalpos)  +  abs(xz) 
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Figure  6:  Learning  error  of  the  input-output  function 


Iteration 

Figure  7:  Similar  cumulative  error  values  for  the  NN 
controller  and  the  non-NN  controller,  for  multiple 
goals  and  compound  cost  function 

during  the  normal  operation  of  the  plant.  The  goal  is 
randomly  changed,  causing  of  the  waves  seen  in  the 
figure.  The  error  {error  is  not  er)  was  not  modified. 
The  results  show  similar  error  functions  for  the  NN 
and  the  original  controller.  Figures  8  and  9  show  also 
similar  results  using  different  cost  functions:  er  — 
abs(x\  —  goalpos)  and  er  =  abs(x3)  respectively.  For 
the  last  two  examples,  the  goal  was  kept  fixed,  which 
is  why  the  function  is  not  wavy. 

Please  note  that  these  results  where  achieved  with- 
out the  planner  and  are  only  intended  to  show  that 
the  NN  is  learning  the  control  rule. 

V.B.  Learning  for  Planning 

One  of  the  biggest  problems  in  learning  is  finding 
the  correct  measure  of  goodness  that  allows  the  al- 
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Sum-Squared  Network  Error  for  37  Epochs 


1000  1200 


Figure  8:  Similar  cumulative  error  values  for  the  NN 
controller  (lighter  curve)  and  the  non-NN  controller, 
for  one  goal  and  position  cost  function. 


Figure  10:  Learning  error  of  the  state- action  to  new 
state  function 
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Figure  9:  Similar  cumulative  error  values  for  the  NN 
controller  (lighter  curve)  and  the  non-NN  controller, 
for  one  goal  and  angular  cost  function 

gorithm  to  separate  the  "good"  examples  from  the 
"bad"  examples.  This  measure  cannot  be  introduced 
unless  many  circumstances  of  functioning  are  taken 
into  account.  For  example,  if  the  cart  is  to  the  left  of 
the  goal  and  the  pole  is  angled  to  the  left,  one  would 
imagine  that  a  force  to  the  left  could  bring  the  cart 
and  the  balanced  pole  to  the  left.  If  the  force  is  high, 
in  the  short  term,  the  impression  will  be  that  the  cart 
has  made  good  progress.  But  unfortunately  this  may 
change  as  soon  as  the  goal  is  achieved.  The  forces 
of  inertia  will  "over-regulate"  the  pole.  On  the  other 
hand,  we  can  have  long  term  measures  of  goodness 
that  are  worthless  in  the  short  term. 

Our  planning  system  uses  not  a  single  one  of  these 


heuristics  but  all  of  them  in  a  planning  system.  In 
this  case,  we  can  create  control  rules  by  training  NNs 
on  experiences  that  are  good  at  different  length  expe- 
riences (strings  of  state- action-state...).  For  example, 
we  can  take  all  the  SAS  (state  action  state)  triplets 
that  are  above  a  certain  measure  of  goodness.  Good- 
ness is  calculated  as  the  difference  to  the  distance  to 
the  goal  before  and  after  the  string  of  actions.  The 
same  can  be  repeated  for  SAS  AS  quintuples  and  with 
longer  strings.  The  NNs  trained  with  shorter  strings 
will  have  many  experiences,  since  there  is  higher  num- 
ber of  shorter  strings  than  longer  ones.  The  cost  mea- 
sure will  give  correct  results  only  in  the  short  term, 
which  may  not  be  good.  On  the  other  hand  NNs 
trained  with  longer  strings  will  have  less  strings,  but 
the  cost  measure  is  trustworthy  in  a  longer  term.  In 
this  case  we  are  interested  in  minimizing  the  cumula- 
tive error. 

The  idea  is  to  have  these  networks  store  behav- 
ior that  was  successful  in  the  past  at  different  string 
lengths.  The  planner  will  start  exploring  the  decision 
space  along  this  successful  paths  by  creating  tenta- 
tive combinations  of  strings  with  different  lengths.  In 
most  plants  behavior  that  was  successful  in  the  past 
has  a  higher  probability  of  future  success.  From  the 
errors  that  were  introduced  in  the  old  plant  controller 
we  want  to  learn  behaviors  that  where  successful  not 
because  of  the  controller  but  by  chance. 

The  NN  for  simulation  was  trained  with  the  state 
and  action  as  input  and  the  next  state  as  output. 
The  Levemberg-Marquardt  algorithm  was  used  and 
the  structure  of  the  net  is  5-5-4  net.  The  evolution 
of  the  training  error  can  be  seen  in  Figure  10,  and 
the  output  is  compared  to  the  Runge-Kutta  value  in 
Figure  11. 
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Figure  11:  Runge  Kutta  simulation  output  (top)  and 
NN  output 

The  complete  simulator  can  be  trained  using  only 
one  NN  in  very  small  problems.  Where  it  becomes 
computationally  costly  to  train  to  increase  the  num- 
ber of  nodes  to  allow  the  network  to  properly  model 
the  system,  the  problem  can  be  divided  into  smaller 
pieces.  There  could  be  a  set  of  networks  covering  all 
or  some  of  the  space  that  we  want  to  simulate.  This 
interesting  topic  is  outside  of  the  scope  of  the  paper. 

VI.  The  NN  planner 

The  neural  network  planner  learns  two  kinds  of 
rules  from  previous  experiences:  action  rules  ,  and 
simulation  rules.  Action  rules  have  the  form 


A,g 


(9) 


where  5  is  a  situation  vector,  A  is  a  action  vector,  g 
is  goodness  estimate,  and  — >  implies  mapping.  These 
rules  will  be  used  by  the  planner  to  select  the  actions 
that  where  good  performers  in  the  past. 

Simulation  rules  are  the  second  kind  of  rules  that 
are  learned  by  this  algorithm.  They  have  the  form 

ShAi-+Ai+1  (10) 

where  Si  is  a  situation  vector,  A{  is  a  action  vector, 
and  5i+i  is  an  approximation  of  the  state  vector  where 
the  system  is  likely  to  be  after  action  Ai  is  applied  at 
Sx. 

The  NN  planner  uses  all  the  networks  shown  before. 
The  steps  performed  by  the  planner  are  as  follows: 

1 .  The  system  runs  using  the  previous  controller  in- 
troducing noise  in  its  output  to  the  plant.  The 
state  (S)  used  by  the  old  controller  and  the  ac- 
tion (A)  applied  by  the  controller  (with  the  noise) 
will  be  stored  in  a  database. 

2.  Experiences  (SiAiSi+i)  (Figure  12)  are  ordered 
based  on  the  goodness.  Where  goodness  is  the 
difference  of  the  distances  to  the  goal  before  and 
after  A. 

3.  The  "best"  experiences  are  selected. 

4.  A  neural  network  with  St  as  input  and  Ai  as  out- 
put is  trained.  We  will  call  these  NNs  control 
rules. 
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Figure  12:  Proposed  Approach 


Figure  13:  Decision  tree 


5.  A  neural  network  with  Si  and  Ai  as  input  and 
Sl+i  as  output  trained.  We  call  these  NNs  simu- 
lation elements. 

6.  The    previous    4    steps    are    repeated  for 
SiAiSi+iAi+iSi+2, 
SiAiSi+iAl+\Si+2Ai+2Si+3,  etc. 

7.  A  lookahead  search  tree  is  generated  using  the 
current  state  as  the  starting  point  and  the  con- 
trol rules  for  SiAiSi+i,  SiAiSi+iAi+1Si+2,  etc 
where  Si  match  the  current  situation.  Once  these 
actions  are  applied,  we  use  the  simulation  ele- 
ments to  simulate  the  next  imaginary  state  (be- 
cause we  have  not  applied  the  action  yet).  We 
repeat  this  procedure  again  until  we  achieve  the 
desired  imaginary  state  or  we  run  out  of  time. 
This  lookahead  planner  is  illustrated  in  Figure  13. 
Please  note  that  if  enough  time  is  allocated,  paths 
with  actions  that  are  not  recommended  by  the 
control  rules  can  also  be  explored. 

8.  The  first  action  in  the  best  path  through  this 
decision  tree  is  chosen  as  the  action  to  execute. 
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Figure  14:  NN  planner 


Figure  14  shows  a  3D  visualization  of  the  process 
of  developing  a  decision  plan.  As  shown,  only  three 
control  rules  exist  in  this  example.  The  opening  of 
this  decision  tree  follows  the  Dijkstra  algorithm. 

VII.  Conclusions  and  Future  Work 

1.  A  planning/learning  subsystem  is  constructed  to 
use  in  control  systems 

2.  The  planner  first  explores  the  previously  success- 
ful paths  but  it  continues  with  the  search  if  suffi- 
cient time  is  given.  This  planner  will  find  optimal 
trajectories. 

3.  The  system  incorporates  both  learning  and 
searching  which  makes  it  different  and  advanta- 
geous in  comparison  with  the  known  systems 

4.  In  the  future,  we  are  going  to  implement  and 
compare  other  NN  engines:  CM  AC,  conjugate 
gradient,  etc. 

5.  The  approach  seems  to  outline  a  constructive 
model  of  brain-like  intelligence. 
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Abstract 

This  paper  briefly  sumarizes  and  cites  new  work 
which  tries  to  bridge  the  gap  between  advanced 
neural  network  designs  and  some  of  the  known 
capabilities  of  mammalian  intelligence.  The  new 
design  draws  heavily  on  concepts  of  hierarchy 
and  temporal  chunking  from  AI,  and  on 
relational  representation  of  objects  and  space. 

Overview 

By  1995,  several  groups  of  researchers  had 
finally  implemented  neural  network  designs 
which  met  certain  basic  criteria  for  what  a  brain- 
like intelligence  should  look  likefl].  These 
designs  were  intended  to  be  able  to  learn  "any 
task"  involving  goal-directed  behavior  over  time, 
without  prior  knowledge  of  the  task.  In  technical 
terms,  such  designs  have  been  called  "model- 
based  adaptive  critics"  or  model-based 
approximate  dynamic  programming  (ADP). 
They  have  sometimes  been  classified  as  a  type  of 
reinforcement  learning  system. 

The  engineering  benefits  of  such 
designs  have  become  increasingly  clear  over  the 
past  few  years.  By  now  ten  groups  have 
successfully  implemented  such  designs  [2].  The 
strengths  and  weaknesses  relative  to  conventional 
designs,  in  terms  of  stability,  etc.,  are  becoming 
better  understood  [3].  In  formal  terms,  these 
designs  provide  a  capability  for  optimization  over 
time 


in  uncertain  noisy  nonlinear  environments;  that, 
in  turn,  permits  systems  to  learn  how  to  track  a 
narrow  desired  trajectory  or  performance 
envelope,  which  is  especially  useful  when  trying 
to  operate  highly  dynamical  nonlinear  plants. 

In  1970  or  so,  when  I  first  developed 
the  theory  behind  these  designs,  I  hoped  that  they 
would  be  enough  to  describe  at  least  those 
aspects  of  higher-order  intelligence  which  exist 
in  the  mammalian  brain.  This  is  not  the  same 
thing  as  explaining  consciousness  in  the  minds  of 
humans.  In  order  to  understand  consciusness  in 
the  minds  of  humans,  one  must  also  address 
issues  such  as  symbolic  reasoning,  quantum 
computaing  and  the  soul,  etc.;  although  I  do  have 
ideas  about  those  subjects  [4],  they  are  beyond 
the  scope  of  this  paper.  In  my  view,  a  scientific 
understanding  of  the  mammal-brain  level  of 
intelligence  is  a  prerequisite  to  a  truly  deep 
understanding  of  those  more  advanced  levels  of 
intelliegnce[2,4].  Indeed,  the  new  design  to  be 
mentioned  here  does  begin  to  provide  some  ideas 
about  the  deep  structure  underlying  verbs, 
nouns,  and  adverbs,  at  the  very  least. 

Nevertheless,  by  1996  [1],  it  was 
apparent  that  these  advanced  designs  still  had 
two  major  deficiencies  -  major  gaps  between  the 
kinds  of  learning  capability  they  provide,  and  the 
kind  which  exists  in  the  mammalian  brain. 

First  of  all,  in  order  to  learn  many 
difficult  planning  problems,  such  as  generalized 
spatial  navigation,  it  would  be  necessary  to  "fill 
in  the  boxes"  in  these  designs  with  a  certain  type 
of  slow  but  powerful  neural  network,  the"SRN." 
[5,6].  To  explain  how  the  brain  also  achieves 


*  The  views  herein  are  those  of  the  author,  not  those  of  NSF,  but  the  paper  was  written  on  NSF  time  and 
may  therefore  be  freely  copied  subject  to  proper  citation. 
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high-bandwidth  control  of  muscle  movements, 
one  then  needs  to  assume  a  "two  brains  in  one" 
master-slave  design  described  in  [1]. 

Second,  and  more  seriously,  in  order  to 
learn  how  to  "see"  far  into  the  future,  the 
underlying  learning  rules  have  to  be  changed,  to 
allow  some  sort  of  "temporal  chunking."  The 
usual  critic  adaptation  rules  train  a  network  at 
time  t  based  on  information  about  time  t+1 . 
One  can  do  far  better  by  allowing  the  use  of  t+T, 
at  times,  where  T  is  a  larger  time  interval.  But 
how  can  one  do  this?  My  concerns  about  this 
wedre  magnified  by  the  growing  evidence  from 
neuroscience  that  the  basal  ganglia  are  a  crucial 
part  of  higher-order  intelligence,  and  that  they 
perform  something  very  similar  to  task-based 
planning  in  AI[1]. 

A  careful  analysis  of  the  mathematics  of 
temporal  chunking  in  ADP  forced  me  into  the 
development  of  an  hierarchical  task-based 
design,  made  up  of  "decision  blocks."  An 
international  patent  disclosure  was  filed  on  this 
design  on  June  4,  1997,  through  Scientific 
Cybernetics,  courtesy  of  Sam  Leven.  The  main 
specification  section  of  that  disclosure  is  being 
published  in  [7].  Of  course,  that  paper  mainly 
deals  with  global  implementation  details.  (The 
disclosure  also  required  300  pages  of  additional 
papers  —  mainly  papers  previously  published  — 
in  order  to  describe  some  of  the  subsystems.)  In 
order  to  complement  that  discussion,  another 
paper  is  being  published  very  soon  [2]  which 
summarizes  the  preceding  history  and  the 
intellectual  strategy  here  in  great  detail.  It 
expands  on  the  implications  for  neuroscience, 
stressing  the  empirical  data  on  the  mammalian 
brain. 

I  am  very  grateful  to  Alex  Meystel  for 
drawing  my  attention  to  [8]  ,  soon  after  the 
publication  of  [2].  This  paper  made  me 
appreciate  more  seriously  a  third  limitation  of  the 
earlier  model-based  ADP  systems  —  their 
treatment  of  space.  In  a  formal  sense,  this 
problem  can  be  handled  without  changing  the 
higher-level  architecture  of  the  model  [2,7].  It  is 
a  subsystem  issue,  at  least  for  the  mammalian 
level  of  intelligence.  (But  mammals  do  not  have 
to  coordinate  100  quasi-independent  robotic 
pieces  of  themselves  spread  out  over  an  acre!) 
Nevertheless,  the  required  change  in  neural 
network  subsystems  is  very  large,  and  has  major 
implications  for  neuroscience  [2]  and  engineering 
applications[7].  It  is  particularly  relevant  to  tasks 
like  predicting  the  behavior  of  molecules  or 


controlling  complex  networks  like  heat 
exchangers,  precoolers,  fuel  processors,  utility 
grids,  etc. 


A  Few  Details 
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Figure  1.  Structure  of  a  Decision  Block 

Figure  1  is  a  reduced  version  of  a  figure  which 
appears  in  [2].  It  summarizes  the  structure  of  a 
"decision  block."  In  this  theory,  the  upper  level 
of  the  mammalian  brain  (excluding  the  higher 
motivational  systems)  is  essentially  made  up  of  a 
library  of  such  decision  blocks.  The  blocks  are 
formally  distinct,  but  in  practice  (as  in  the  brain) 
they  may  share  hidden  nodes,  and  must  share  a 
common  estimated  state  vector  R. 

The  task  of  each  decision  block  is  to 
decide  which  lower-level  decision  blocks  to 
activate.  This  is  a  crisp,  discrete  decision.  (In  [1], 
I  speculated  that  the  activation  might  be  fuzzy 
instead  of  crisp;  however,  because  of  the  problem 
of  local  minima,  and  the  need  for  rational 
learning,  this  did  not  turn  out  to  be  promising.) 
For  the  lowest  level  block,  the  decisions  are 
really  just  actions  u  sent  to  the  lower  brain. 
Decision  blocaks  may  be  thought  of  as  "active 
verbs."  The  arrangement  is  strictly  sequential  in 
nature,  with  only  one  decision  block  activated  at 
any  time  on  any  level  of  the  hierarchy. 

When  a  decision  block  is  activated, 
three  kinds  of  additional  information  are  also 
needed,  to  guide  the  actions  taken  nside  the 
block:  "adverb"  modifiers,  a  fuzzy  goal  object  g, 
and  compressed  information  about  the  goals 
inherited  from  higher  levels  of  the  hierarchy.  In 
[7],  mechanisms  for  adapting  all  of  these 
structures  or  networks  are  provided. 

The  fuzzy  goal  object  mainly  consists  of 
an  image  of  a  desired  state  r*  plus  a  vector  of 
weights  (indicating  which  components  of  that 
image  vector  are  most  important).  In  effect,  this 
object  provides  a  kind  of  explicit  quadratic 
representation  of  a  value  function.  The  need  for 
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this  kind  of  representation  became  clear  only 
after  considerable  analysis  of  the  problem  of 
communication  between  decision  blocks. 

Within  each  decision  block,  neural 
networks  must  be  adapted  which  are  very  similar 
to  those  in  the  usual  model-based  adaptive  critics 
—  local  critic  networks,  a  local  Model  or 
prediction  network,  and  a  local  Decion  or  action 
network.  In  this  case,  however,  it  is  especially 
critical  that  the  latter  two  networks  be  stochastic 
networks,  rather  than  the  usual  sorts  of 
deterministic  neural  network[2,7].  Two 
additional  networks  are  needed  in  order  to 
optimize  the  interface  with  other  decsiion  blocks. 

A  key  aspect  of  using  stochastic 
networks  is  that  one  can  replicate  such 
phenomena  as  "imagination"  or  "dreaming"  or 
novelty-seeking  behavior  which  is  crucial  to 
effective  higher-order  problem-solving  and 
mammalian  behavior  [4,9]. 
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ABSTRACT:  The  paper  deals  with  some  formal 
models  of  plausible  reasoning.  The  so-called  JSM- 
method  of  automatic  hypothesis  generation  is  one  of 
these  models.  Some  important  notions  of  JSM- 
reasoning  and  its  logical  basis  are  informally 
introduced  in  this  paper. 

1  INTRODUCTION 

Reasoning  can  be  considered  as  synthesis  of  cognitive 
procedures.  These  procedures  include  valuating  of 
elements  of  our  knowledge  such  as  facts,  hypotheses, 
and  arguments.  General  scheme  of  plausible  reasoning 
is  represented  by  the  notion  of  Quasi-Axiomatic 
Theory  (QAT).  [Finn  1991],  [Finn  1995]. 
Let  Z  be  the  set  of  axioms  that  are  known  to  provide  an 
incomplete  characterization  of  the  application  domain, 
let  Z'  be  an  open  set  of  elementary  (protocol) 
propositions  representing  facts  (we  simply  call  these 
propositions  facts),  where  S'=une(i)5^,  and  En  is  the  set 
of  facts  corresponding  to  the  n-th  state  of  the  QAT.  Let 
9?  be  the  set  of  inference  rules  such  that  9?=SR'u9to> 
where  9?'  is  the  set  of  plausible  inference  rules  (PIR) 
and  9to  is  the  set  of  reliable  (deductive)  inference  rules. 
Then  r'=<I,I',SR>,  r=<Z,9*>  and  Tn  =<Z,Zn,9?>will 
be  called  QAT,  QAT  carcass,  and  n-th  state  of  QAT, 
respectively. 

Computer  analogs  of  these  concepts  are  obvious:  2^,  is 
the  state  of  the  data  base  (the  fact  base)  and  T  is  the 
knowledge  base  containing  declarative  knowledge  (Z) 
and  procedural  knowledge  (91).  The  QAT  itself  is  the 
limiting  (ideal)  notion  that  all  the  states  of  the  carcass  T 
are  attainable  in  the  limit,  e.g.,  we  have 
ZicE2c:...cXnc:.... 

Particular  realization  of  QAT  is  JSM-method  of 
automatic  hypothesis  generation.  The  paper  discuss  this 
apparatus  and  its  logical  foundation. 
Applications    of   JSM-method    are    considered  in 
[Zabezhailo  et  al.  1995],  [Mikheyenkova  1995] 


2  INFINITE-VALUED  LOGIC  WITH 
FINITE  NUMBER  OF  TRUTH  VALUE 
TYPES 


We  use  a  logic  with  the  following  types  of  truth  values: 
empirically  true  (denoted  by  +1),  empirically  false 
(denoted  by  -1),  empirically  contradictory  (denoted  by 
0),  uncertain  (denoted  by  x),  logically  true  (denoted  by 
t),  logically  false  (denoted  by  f),  other  (they  can  be 
added  in  some  special  versions  of  JSM-method).  By  £ 
denote  the  set  of  all  types  of  truth  values.  Thus,  £={+1, 
-1,  0,  t,  t,  f }.  In  some  special  cases,  £  can  be  extended. 
An  informal  interpretation  of  the  truth  value  types  is  as 
follows: 

•  the  truth  value  of  predicate  P  is  a  value  of  type  + 1 
(empirically  true)  iff  it  is  known  that  P  (possibly) 
holds; 

•  the  truth  value  of  predicate  P  is  a  value  of  type  -1 
(empirically  false)  iff  it  is  known  that  P  (possibly) 
does  not  hold; 

•  the  truth  value  of  predicate  P  is  a  value  of  type  0 
(empirically  contradictory)  iff  there  are  arguments 
both  for  P  and  against  it; 

•  the  truth  value  of  predicate  P  is  a  value  of  type  T 
(uncertain)  iff  there  are  neither  arguments  for  P 
nor  against  it. 

The  types  +1,  -1,  0,  and  x  are  called  internal.  The 
types  t  and  f  are  called  external.  We  denote  by  Ve  the 

set  of  all  truth  values  of  type  e.  By  V  we  denote  the  set 
of  all  truth  values. 

We  say  that  e  is  a  non-split  type  if  Ve  is  a  one-element 
set.  We  say  that  e  is  a  (denumerable)  split  type  if  Ve  is 

a  denumerable  set.  The  types  +1,  -1,  and  0  are  split 

types.  The  types  t,  f,  and  x  are  non-split  types. 

By  £'nl  denote  the  set  of  all  internal  types  of  truth 

values.  £int={+l,  -1,  0,  x}.  By  £ext  denote  the  set 
which   consists   of  two   external   types   of  truth 

values.  £ext={t,  f}.  By  £w  denote  the  set  of  all 
(denumerably)  split  types  of  truth  values.  £w  ={+1,  -1, 
0}.  By  £0  denote  the  set  of  all  non-split  types  of  truth 
values.  EQ={x,  t,  f}.  If  it  is  necessary,  we  can  extend 
the  set  £  by  adding  a  finite  set  of  non-split  internal 
types  of  truth  values.  We  denote  this  set  by  £ont  . 
For  any  non-split  type  e  there  exists  a  unique  truth 
value  in  Ve  denoted  by  <e,  0>  (or  by  £).  If  e  is  a  split 

type,  then  Ve  =  {(e,  k)\  k  e  co'},  where 
(o'  =  {0,  1,      co}  =  (oU{co},    i.e.,  co'  is  a  successor 
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of  the  first  limit  ordinal  co.  In  other  words,  co'  is 
obtained  from  the  set  of  all  natural  numbers  by  adding 
the  item  co,  which  is  greater  than  any  natural  number. 
Suppose  we  have  a  procedure  of  hypothesis  generation 
such  that,  using  it  for  an  arbitrary  predicate  P,  we  can 
look  for  the  truth  value  type  of  this  predicate.  Assume 
that  the  truth  value  of  the  predicate  P  is  (e,  k) ,  where  e 

is  a  split  type  of  truth  values.  Then  k  is  the  number  of 
the  step,  where  the  truth  value  type  of  this  predicate 
was  found.  We  can  use  k  as  a  measure  of  plausibility. 
The  more  k  the  less  plausible  is  a  hypothesis.  If  k=0, 
then  the  truth  value  type  of  the  predicate  is  known 
before  the  beginning  of  the  above  procedure. 
In  the  standard  JSM-method,  the  truth  value  (0,  0)  is 

not  considered.  In  other  words,  in  the  standard  JSM- 
method  the  empirically  contradiction  cannot  be  a  fact, 
but  it  can  be  a  hypothesis. 

By  definition,  put    V(x  »  =  {t}U  U{(e,  k)\  k  >  n}.  If 


Je  (e  e  E),  J/F  B\,  (e  e  E^)  where  JE  and        can  be 


ee£m 


£e£ffl  (i.e.,  e  is  a  split  type),  then  by  definition  we  put 

%  n)  =  {(e>      k  -  "}  Intuitively,  the  fact  that  the 

truth  value  of  the  predicate  P  belongs  to  VL  n^  means 

that  we  did  not  find  the  truth  value  type  of  this 
predicate  within  n  steps.  The  fact  that  the  truth  value 
of  the  predicate  P  belongs  to  Vje  \ ,  where  £  is  a  split 

type,  means  that  we  found  truth  value  type  of  this 
predicate  within  n  steps. 

We  will  use  shorter  notation  for  Vje  nj  and  VjT  n^ , 

namely:  (e,  n)  and  (x,  n)  respectively.  Note  that 

(x,  n)=(z,  «  +  l)u{(+l,  n),  (-1,  n),  (O,  «)}=(t,  «  +  l)U  \](e,  n) 


In  our  logic  we  use  the  following  connectives: 

j      (a)  =  it  if  a  =  v6'  ")' 
(e> ")         |f  otherwise, 

j      (a)  _  j*  if  a  6  (T'  n)> 
^T'"^        If  otherwise, 


>(e.») 


(a)  = 


j  t  if  a  e  (e,  «), 


f  otherwise, 


where    e    is    a    split    type    (ee{+l,  -1,  0}); 

-i,  & ,  v,  — »,  <->,  V,  3  ;  the  latter  are  the  classical 
logical  connectives  and  quantifiers  over  {t,  f}. 
We  can  describe  the  syntax  of  this  infinite-valued  logic 
and  define  terms  and  formulas  in  a  general  way.  Both 
infinite-valued  and  two-valued  (external)  predicates 
can  be  considered  in  this  logic.  We  can  take  the 
following  connectives  as  basic  symbols: 
— i,  &,  v,  — >,  <->,  V,  3  (the  classical  two-valued 
connectives  and  the  quantifiers;  it  is  sufficient  to  take 
only  -i,  v,  3  ); 


•<e,B> 

defined  as  follows: 


Je(«)=  f 


t  if  cxeVe, 
otherwise, 


(e'"^  ^    {f  otherwise. 
Then  by  definition,  we  put 

J(£.«)(p)  =  VJ(e,fe)(/>)'      where      ee£M,  and 

k<n 

J(x.  n)(P)  =  Jx(P)V  V  (J. (P)  &  -J(e,  n)(P)) 

The  above  logic  is  an  example  of  the  J-definable  J- 
compact  logics  considered  in  [Anshakov,  Finn, 
Skvortsov  1989].  In  this  paper  it  is  proved  that  every  J- 
definable  J-compact  logic  is  strongly  axiomatizible, 
i.e.,  for  any  J-definable  J-compact  logic  L  one  can 
construct  a  calculus  /  such  that  the  strong  completeness 
theorem  holds:  r  \-{  A      iff      r  \=L  A  ,  where 

r  \-j  A  means  that  the  formula  A  is  deducible  from 
the  set  of  formulas  T  in  the  calculus  /,  r  \=L  A  means 
that  the  formula  A  is  a  semantic  consequent  of  the  set 
of  formulas  T  in  the  logic  L. 


3  SIMPLIFIED  FORMULATION  OF  JSM- 
RULES 

We  consider  items  of  three  types.  The  items  of  the  first 
type  are  objects  of  arbitrary  nature.  The  items  of  the 
second  type  are  fragments  (parts)  of  these  objects. 
Assume  that  we  can  find  the  common  part  (fragment) 
of  any  set  of  objects.  The  fragments  may  be  objects, 
but  they  are  not  neccesarily  objects.  The  items  of  the 
third  type  are  sets  of  properties  of  objects.  We  can 
imagine  the  objects  as  sets,  the  fragments  as  subsets, 
the  common  part  of  several  objects  as  their 
intersection. 

Suppose  X  is  an  object,  x  is  a  fragment  (part)  of  an 
object,  A  is  a  set  of  some  properties  of  the  objects. 
Consider  two  many-valued  binary  predicates  denoted 


by 


and 


>2 :  X  =>]  A  means  that  the  object  X 


(possibly)  has  all  the  properties  from  A;  x  =>2  ^ 
means  that  the  fragment  x  is  a  (possible)  cause  of  any 
property  from  A..  In  the  simplest  case  A  =  {a}.  In  this 

case  we  write    X  =>,  a    and   x  =>2  a    instead  of 

X  =>j  {a}  and  x  =>2  {a}  respectively. 

For  the  sake  of  simplicity,  assume  that  our  objects  are 
represented  by  sets;  their  fragments  are  subsets  of 
objects;  the  common  part  of  several  objects  is  their  set- 
theoretic  intersection.  Moreover,  suppose  that  all  sets 
of  properties  of  the  objects  are  one-element . 
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We  write  J^T  „)(*=>  2  a)  if  we  do  not  yet  know 

whether  the  fragment  x  is  a  (possible)  cause  of  the 
property  a,  after  n  steps  of  Hypothesis  Generation 
Procedure.  We  write  J^+1  n^(x  =>2  a)  if  we  discovered 
that  x  is  a  (possible)  cause  of  the  property  a  just  at  step 
n.  Similarly,  we  write  J^_,  ^(x  =>2  a)  if  at  steP  n  we 
found  arguments  for  x  to  be  a  cause  of  absence  of  a, 
and  we  write  J^0  ^(jc=>2  a)  if  at  steP  n  we  f°und 

arguments  for  x  to  be  a  (possible)  cause  of  both 
presence  and  absence  of  a 

The  Fist-type  JSM-rules  are  formulated  as  follows: 
J(t,n)(j:=>2  4  M»(*.  a),  ^M^x,  a) 


'(+1,  n+l) 


("+"-rw/e) 

("-"-rw/e) 

J(x,n)(x=>2  a)-  Mn(**  a),  ^M+(x,  a) 


("0"-rule) 


J(_,,„+l)(*=*2  «) 


J(0.n+l)("r^2«) 


("x"-rule) 


J(t,*)(*^2  4  "-M^x,  a),  -,M~n(x,  a) 


J(U+1)(^2«) 


We  write  J^+1  n^(X  =>]  a)  if  we  discovered  that  the 

object  X  (possibly)  had  the  property  a  within  n  steps  of 
Hypothesis  Generation  Procedure.  Similarly  we  write 
J(_i  „)(X  =>i      if  we  discovered  that  X  do  not  had 

the  property  a  within  n  steps. 

Suppose  S  +  (a)  =  {x\  3{+ln](X  =>,  a)}, 

5„"(a)  =  {xi  J(_1>n)(X  =>!  a)j.  M+(y,  a)  means  that 

there  exists  X  cS„+(a)  such  that  the  following 
conditions  hold: 

(1)  xcs;(a); 

(2)  y=  I  X; 

XeX 

(3)  y*0. 

M~(y,  a)  means  that  there  exists  X  cS~(a)  such 
that  the  following  conditions  hold: 
(1)  |X|>2; 

(2) 


y=  I  X; 

XeX 

(3)        y  *  0  ; 

Suppose       J(+i,„)(X;  =>i  a), 
(i  =  l,  2,  3);     Jc+l^CF!  ==>!  a), 
(see  Fig.  3-1).  Using  first-type  JSM-rules,  we  obtain, 


J(-l,n)(Zi  =>]  a)> 


J(+l,„+l)(*=>2  a).  J(0,n+1)(^=>2  «)-  J(-i,„+i)(z=>2  a). 

We  write  J(T  n^(X  =>,  a)  if  we  Jo  not  yet  know 

whether  the  object  X  has  the  property  a  after  n  steps  of 
Hypothesis     Generation     Procedure.     We  write 
n){^  ^1  a)  if  we  discovered  that  X  has  f/ie 

property  a  just  at  ste/?  n.  Similarly,  we  can  describe  the 
conditions  J^_j  n^[X  =>,  a)  and  J^0  n^(X  =>j  a). 

Tfce  Fist-type  JSM-rules  are  formulated  as  follows: 


("+"-rule) 


("-"-rule) 


J(+l,n+l)(X=>lfl) 


J(_,,„+1)(X  =»i  a) 


("0"-rule) 


J(m)(x=*i4  n°(x,a) 


J(0,n+l)(X  =*lfl) 

(V-ruZe) 

*(x.n)(x  =>i  4  ~'n»(X'  4  4-  ^n°(X'  4 

j(+i,»+i)(X=>ifl) 

n*(X,  a)   holds  iff  the  following  conditions  are 
satisfied: 

0)        J(+1  n)(y  =>2  a)  for  some  yc  X; 

(2)        J(_,  ^(z=>2  a)  for  not  any  zc  X. 

U~(X,  a)  and  Tl+n(X,  a)  are  defined  dually. 

Tl0n(X,  a)   holds  iff  the  following  conditions  are 
satisfied: 

(!)        J(+i,  n)(>  ^2  a)  for  some  ^X; 

(2)        J(_,_  „)(z=>2  Q)  for  some  zcl. 

For  example,  let  J(+1  ^(x  =>2  a) ,  J(_,  n^(y  =>2  a) 

(see  Fig.  3-2).  Then  using  the  second-type  JSM-rules, 
we   obtain    J(+1>  n+1)(X  =>2  a) ,    J(o  n+^(Y  =*2  a) , 

J(_i(„+1)(Z=>2a)  J(t,  „+i)<w=>2  «)• 
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Figure  3-1:  First-Type  Rules 


4  JSM-THEORY 

The  JSM-theory  T(M)  imitating  system  M  of  rules  of 
the  JSM-inference  is  constructed  on  the  base  of  the  J- 
logic  V  with  finite  number  of  types  of  truth  values.  This 
theory  is  constructed  in  some  many-sorted  language  L 
containing  individual  variables  of  three  sorts:  for 
objects,  subobjects  and  properties  (or  sets  of 
properties). 

The  JSM-theory  consists  of  three  families  of  axioms. 
Procedural  axioms  imitate  hypothesis  generation 
procedures  of  the  JSM-method.  Namely,  the  plausible 
rules  of  the  first  and  second  type  are  rewritten  in  the 
form  of  implications  (so-called  axioms  of  step-by-step 
inference): 
(e,r,n) 


WX  =>i  A)&  IT^X,  A)^J(e  „+l)(X  A), 
(0),r,n) 

where  r=l,2  and  ee  Ea. 

Similarly,  the  axioms  imitating  the  first-type  rules  can 
be  formulated. 

Then  the  axioms  of  stabilization  are  added: 

V^(j(x,n)(^^.  A)^J(t,n+i)(*=*i  A))-> 

-*VXVa(jm(X  =>,  A)^JT(X  =>,  A)) 

Similarly,  the  axiom  of  stabilization  with  respect  to  the 
predicate  =>2  can  be  formulated. 


W 


Figure  3-2:  Second-Type  Rules 


(namely,  if  on  the 
step  n  no  one  new 
value  <e,n+l>  is 
attributed  then  all 
remained  sub- 
defined  up  to  their 
moment  acquires 
the  ultimate  value 

CO). 

Remark.  For  each 
axiom  of  step-by- 
step  inference,  the 
converse  holds., 
i.e.  the 
correspondence 
equivalencies  are 
provable  in  the 
JSM-logic.  These  equivalencies  correspond  to  the 
constructivity  of  generation  of  truth  values  in  the  JSM- 
inference. 

The  generating  condition  are  expressible  by  the 
formulas  of  the  language  L.  These  formulas  involve  the 
notions  of  similarity  and  difference  of  the  objects 
considered.  More  precisely,  the  operations  (e.g.  set- 
theoretical  intersection  or  the  corresponding  operation 
on  graphs  etc.). 

!Algebraic  a^wmsTa  describe  the  properties  of  these 
operations,  e.g.  semilattices  or  Boolean  conditions  on 
&  (namely,  axioms  of  atomic  Boolean  algebras) 
Some  means  of  description  of  finite  families  of  objects 
(and  subobjects)  are  necessary  too.  Namely,  one  can 
use  quantifiers  on  sequences  of  an  arbitrary  finite 
length  (i.e.  finite  functions  on  natural  numbers)  or 
quantifiers  over  finite   sets   of  objects   (and  the 
corresponding  axioms  of  weak  monadic  second  order 
logic  are  included  in  Ta). 

Hence,  algebraic  part  Ta  describes  the  structure  of 
domains  (of  our  three  sorts)  and  corresponds  to  the 
structure  of  data  in  database  with  incomplete 
information  involved. 

Finally,  the  JSM-theory  can  conclude  some  declarative 
axioms  Td  corresponding  to  procedures  of  verification 
and  falsification  of  results  of  the  JSM-reasoning,  in 
particular: 

(i)  axioms  of  formal  correctness  representing  the 
interrelation  between  the  predicates  (e.g.  axioms  of 
monotonicity  and  additivity), 

(ii)  axioms  of  control  of  JSM-inference  (axioms  of 
causal  completeness,  consistency  etc.) 

(iii)  descriptive  axioms  characterizing  peculiarities  of  a 
subject  domain  (e.g.,  some  principles  of  sociology  or 
another  field  to  which  the  JSM-method  is  applied). 
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ABSTRACT 

The  present  report  continues  the  series  of  papers, 
devoted  to  attempts  to  describe  formal  frames  of 
theory  of  reasoning  for  sociological  investigation  and 
to  apply  it  to  real  sociological  problems.  Logical- 
combinatorial  JSM-method  of  automatic  hypotheses 
generation  is  chosen  to  be  the  basis  of  our  approach. 
The  method  realizes  special  class  of  plausible 
reasoning  in  open  subject  domains  with  large  amount 
of  empirical  material  and  ill-formalized  knowledge. 

KEYWORDS:   plausible   reasoning,  quasi- 
axiomatic  theory,  modelling  in  the  humanities 

1.  INTRODUCTION 

The  present  report  continues  series  of  papers, 
devoted  to  attempts  to  describe  formal  frames  of 
theory  of  reasoning  for  sociological  investigation 
and  to  apply  it  to  real  sociological  problems  [1- 
3].  Logical-combinatorial  JSM-method  of 
automatic  hypotheses  generation  is  chosen  to  be 
the  basis  of  our  approach.  The  method  realizes 
special  class  of  plausible  reasoning  in  open 
subject  domains  with  large  amount  of  empirical 
material  and  ill-formalized  knowledge  [4,  5]. 
Informally  the  method  can  be  considered  as  a 
reasoning  scheme  "similarity  -  cause  -  analogy": 
the  reasons  of  events  (phenomena)  are  searched 
by  the  analysis  of  their  structural  similarity  and 
then  these  reasons  are  employed  to  forecast  new 
events  (phenomena)  by  structural  analogy.  This 
scheme  corresponds  to  the  principle  "similarity 
of  objects  involves  similarity  of  their  properties". 
Here  similarity  (or  it's  n-ary  generalization)  is 
considered  to  be  reflexive  and  symmetric 
relation. 

2.  APPLICATION      OF  JSM- 
METHOD  IN  SOCIOLOGY 

Thus,  sociology  as  the  humanity  investigating 
social  phenomena'  similarity,  the  humanity  with 
ill-formalized  knowledge  generated  on  the  basis 
of  huge  empirical  data,  can  be  supposed  to  be  a 
field  for  JSM-method  application.  However, 
certain  condition  must  be  fulfilled  for  success  of 
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JSM-method' s  application  (see,  for  example  [2, 
6]),  in  particular,  empirical  data  must  really 
contain  causal  relations. 

The  possibility  of  JSM-application  in  sociology 
is  based  on  consideration  of  social  systems  as 
systems  with  multi-factor  (^-influences.1 
Consequently,  we  speak  about  studying  the 
relation  "subject  (individual)  =>  behaviour"  in 
supposition  that  the  individuality  of  the  person  is 
responsible  for  his  behaviour.  Comparison  of 
individuals'  behaviour,  similarity  and  difference 
revealing  is  necessary  to  use  proposed  tools  for 
analysis  of  the  involved  relation. 
For  this  analysis  we  need  structural  description  of 
the  subject  (individual)  by  the  system  of 
differential  indications.  Part  of  them  is  to 
describe  social  characteristics  (in  accordance 
with  idea  of  E.Fromm  about  social  character). 
Another  part  defines  the  characteristics  of 
individual  irrelatively  to  social  interactions.  And 
the  last  part  represents  features  of  person's 
biography,  essential  for  the  problem  involved.  By 
another  words,  proposed  approach  gives  the 
possibility  of  individual  behaviour  prediction, 
when  combinatorial  analysis  is  inaccessible 
(because  of  substantial  amount  of  information) 
for  human's  brain. 

Let's  underline,  that  the  relation  "subject 
(individual)  =>  behaviour"  is  the  micro-level 
representation  of  the  macro-level  (macro- 
sociology)  relation  "situation  behaviour".  The 
latter  relation  to  be  investigated  "situation" 
concept  must  be  specified  and  formalized. 
General  scheme  of  JSM-reasoning  (see  [4,  5])  in 
application  to  problem  involved  is  interpreted  as 
followes. 

(1).  Creation  of  statements  about  reasons  of 
objects'  properties  (by  observing  of  facts 
(examples)  from  the  initial  data  base  with 
incomplete  information  -  DBII) 
application  of  plausible  inference  rules  of 
first  kind  (PIR-I).  Determinants  of 
behaviour  are  defined  on  this  stage. 


We  are  aware  that  more  objective  picture  of  social 
reality  must  include  a  description  of  irregular 
disturbances  [7],  treated  by  statistical  methods. 


(2)  .  Prediction  of  unknown  properties  of  objects 

in  DBII  (the  first-step  hypotheses  about 
reasons  are  used  in  this  prediction), 
inference  by  analogy  -  application  of 
plausible  inference  rules  of  second  kind 
(PIR-II).  Forecast  of  individual  behaviour  is 
carried  out  on  this  stage.  Forecast  control  is 
realized  experimentally  (in  corresponding 
period  of  time). 

(3)  .  Explanation  of  facts  and  results  and  inference 

control  on  the  base  of  sufficient-basis-for- 
conclusion  criterion  (CSBC).  We  generate 
falsificators  for  DBII  automatically  by 
CSBC  use. 

3.  JSM-METHOD  WITH  SCALES 

Let  U(1)  =  {dx,...,drx },  U<2)  =  {ax,...,ah  }  be 
two  sets,  on  which  we  define  two  Boolean 
algebras,  (E\  =  {2U  ,  -,  n,  u}  -  algebra  of 

objects,  (E2  =  {2U  ,  -,  n,  u}  -  algebra  of 
properties.  Consider  2  partially  defined  relations: 
=»i*  and  =>2*  •  2-place  predicate  symbol  =>i 
corresponds  to  =>i*  ,  2-place  predicate  symbol 
=>2  -  to  =>2*  •  X=>]Y  means,  that  the  object  X 
possesses  the  set  of  properties  Y;  the  predicate 
describes  data  base  with  incomplete  information 
(DBII).  V=>2\V  means,  that  the  subobject  V 
causes  the  set  of  properties  W;  the  predicate 
describes  automatically  generated  fragments  of 
knowledge  base  (KB).  Inner  truth  values  (for  real 
facts  and  semi-facts  representation)  have  the  form 

V  =  ( V,  n) ,  n  -  number  of  step  of  PIR  using. 
Traditionally  four  types  of  inner  truth  values  v  in 
JSM-logic  are  used:  +1  -  factual  true,  -1  -  factual 
false,  0  -  empirical  contradiction  and  x  - 
uncertainty.  Types  of  external  truth  values  (for 
facts'  with  valuation  and  PIRs'  representation):  t 
-  "logical  true",/-  "logical  false".  Let  7vO  be  the 
operator,  7vO  =  true,  if  v(O)  =  v  ,  7vO  =  false,  if 
v(O)  *  v  ,  where  v[<t>]  -  valuation. 
We  suppose  the  knowledge  in  empirical  theories 
(that  deal  with  the  open  world,  progressively 
replenished  with  new  facts)  to  be  represented  in 
the  form  of  quasiaxiomatic  theory  (QAT)  [4,  5]  3 
=  (I,  I',  R),  where  I  is  an  open  axiom  set, 
describing  a  subject  domain  (SD)  incompletely; 
If  is  an  open  set  of  empirical  elementary 
statements  about  subjects  from  SD  (I 
corresponds  to  knowledge  base  KB,  I' 
corresponds  to  data  base  DB).  R  is  an  inference 
rules  set,  R  =  Rd  u  Rp,  where  Rd  is  a  set  of 
deductive  rules,  Rp  is  a  set  of  plausible  inference 
rules  (PIR). 


In  QAT  for  JSM  I'  -  set  of  empirical  facts 
^(X^Y)  (ve  { 1,  -1,  x})  of  the  considered  SD. 
7(v>0>(X=»iY)  means,  that  the  statement  "object  X 
possesses  set  of  properties  Y"  has  the  valuation  v 
(true,  false  or  uncertainty)  in  the  initial  state  (on 
the  0-th  step  of  reasoning).  Plausible  inference 
rules  Rp  in  the  JSM-QAT  are  PIR-I  and  PIR-II. 
Let's  turn  to  our  problem. 
Let  U(1)  =  {  U],...,  ur}  be  a  set  of  differential 
indications,  describing  individual  personalities' 
qualities,  several  social  features  and  biographical 
data  (sociological  data  are  collected  by  methods 
of  formalized  interview  and  psychological  tests  in 
our  investigation).  X,  (i*  =  1,..,  k)  -  objects 
(persons,  subjects  of  sociological  investigation, 

X,  e  2U<1>.  U(2)  =  {b,,...,  bs}  -  a  set  of 
behaviouristic  readinesses,  Y,  (i  =  1,...,  k)  -  a  set 
of  /-subject's  behaviouristic  readinesses,  Y,  e 

2U°\ 

The  relation  =>i*  is  formed  on  the  basis  of  expert 
analysis  (which  is  fulfilled  by  sociologists  and 
psychologists)  of  questions  complexes.  Since 
answers  of  "possibly,  yes"  and  "possibly,  no"  - 
types  are  very  often  in  sociopsychological 
practice  (side  by  side  with  "yes"  (+1)  and  "no"  (- 
1)  -  answers),  we  introduce  two  new  types  of 
truth  values  -  +1/2  and  -1/2  -  to  characterize  the 
degree  of  properties  presence  or  absence.  Logical 
connectives  &  and  v  are  defined  in  a  standard 
manner  -  see,  for  example,  [4]. 
Three  strategies  of  empirical  data  investigation 
with  different  degrees  of  properties'  presence  or 
absence  (different  scales)  are  possible. 

(a)  .  The  difference  between  degrees  is  supposed 
to  be  unimportant,  only  direction  (+  or  -)  is 
essential.  Then  types  of  truth  values  +1/2  and 
+1    (as    far    as    -1/2    and    -1)  merge, 

VXVY(y(1,0)(X^1  Y)^y^  ,o>(X=hY)),  and 
we  return  to  traditional  four  types  of  truth 
values  (see  above). 

(b)  .  Mechanisms  of  different  degrees  of 
properties  presence  or  absence  exposing  are 
different.  Then 

VXVY^(7(1,0)(X^1Y)^y(i-,o>(X^1Y)), 
~  +1 

special  predicates  Ma  \  (V,W,k)  for  treating 

facts  J(-j  fy(X=>iY)  (n  -  number  of  calculation 
step)     are     introduced     (analogous  to 

M*n(V,W,k)       for       treating  facts 

y(ii0>(X=*iY);  for  -1/2  and  -1  -  symmetrically) 
and  then  usual  scheme  of  JSM-reasoning  is 
applied. 

(c)  .  ±1/2  is  considered  to  be  "weak"  (expressed 
insufficiently)  ±1  correspondingly  (see  also 


[1]),  VXVYC/^X^YWi.o^iY))- In 
this  case  we  take  into  account  examples  with 
both  types  of  truth  values  (+1/2  and  +1)  to 
discover  the  reasons  of  relation's  =>j* 
exposing  with  the  truth  value  +1/2.  This 
process  is  formally  described  by  the  predicate 

Ma  \  (V,W,k)  (which  differs  from  section 

(b)  predicate).  This  predicate  reveals  local 
similarity   on   (+1/2)-   and  (+l)-examples 

y(^-,n)(Z1^1U/)  and  J(1,n)  (Z,-  =>,  XJj) 
correspondingly,  i  =  1,..,  p,  j  =  p+l,...,  k, 
where  k  is  variable  (1  <  p  <  k,  k  >  2).  (For  - 
1/2  -  symmetrically.) 

Analogously,  the  predicate  Ti+B2  (V,W,k)  for  2-nd 
kind  rules  PIR-II  (rules  for  prediction)  includes 
the  subformula,  expressing  the  statement,  that  V 
contains  positive  (+1/2)-  and  (+1)  reasons 
Xi,...,Xp  and  Xp+1,...,Xlt  of  properties  Yh...,Yk 
correspondingly,  W  being  covered  by  these  sets 
of  properties.  (For  -1/2  -  symmetrically.) 
PIR-I  gets  the  form. 

4_n)(V=*2W), 

M+a,n(V,W)&^M"a,n(v,w)&-,M  7n(v,w) 
d(+))  

/(i^i,(V=>2W) 

4.n)(V^2W), 

-M+a,n(v,w)&^M"a,n(V,W)& 
M  ^n(v,W)&-.M  ^(V.W) 

(I(+T))   

^<T.n+l>(V^2W) 

(V=>2W)  gets  truth  value  +1/2,  only  if  it  does  not 
simultaneously  gets  truth  value  +1.  (For  -1/2  and 
-1  -  symmetrically.) 

4fl)(v=>2w), 
M+a,n(V,W)&M"a,n(V,W)v 

lVr^(V,W)&M  ^n(V,W)v 

M"a,n(v,w)&M  *J,(V,W)v 

M  ^n(v,W)&-,M  7n(V,w) 
(I(0))   

V+i>(v=*2W) 

^.n)(V^2W), 

^M+a,n(V,W)&^M"a,n(V,W)& 

-.M  ^n(v,W)&-,M  7n(V,W) 
dw)   
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In  comparison  with  [4,  5]  representation  of  rules 
for  PIR-n  is  changed  only  for  ±1/2. 

/(„n)(v^,w),  n+2  n(v,w)&^n+n(v,w) 

(H(+T))   

/(-^(V^W) 
(V=>iW)  gets  truth  value  +1/2,  only  if  it  does  not 
simultaneously  gets  truth  value  +1.  (For  -1/2  and 
-1  -  symmetrically.) 

The  former  consideration  involves  two  degrees  of 
properties  presence  or  absence  -  +1/2  and  ±1. 
This  scheme  can  be  generalized  to  the  case  of  n 
degrees. 


4.  CONTEXTUAL  METHOD 

But  the  simple  (binary)  model  of  causal  relation 
"cause  -  effect"  appears  to  be  quite  insufficient 
for  many  empirical  fields.  More  profound  - 
contextual  -  analysis  of  causal  relation  is 
provided  by  the  so-called  "generalized"  JSM- 
method.  We  shall  try  to  develop  an  idea  of 
contextual  analysis  in  application  to  the  problem 
involved. 

Let  us  introduce  new  sorts  of  variables. 
Variables  of  the  5th  sort  -  for  set  of  attendant 
(promoting  or  deterrent)  factors,  U(5>  =  {cj,..., 
cp},  constants  and  variables  of  the  5th  sort, 

Ze  2     .  Variables  of  the  4  sort  for  sets  Z,  Zu—, 

2ul5) 

Zm...(and  corresponding  constants),  Ze2 
Alongside  with  Boolean  algebras    (i=  1,  2)  (see 

above)  Boolean  algebras  (E4  =  { 2     ,  — ,  n,  u}  - 

2U 

algebra  of  attendant  factors  -  and  (E3  =  {2 

— ,  n,  u}  -  algebra  of  external  circumstances  - 

are  considered. 

Let's  consider  ternary  predicate  F(X  Z,Y)  - 
"objects  X  under  the  circumstances  Z  possesses 
the  set  of  properties  Y"  -  instead  of  binary 
predicate  X=>jY  -  "object  X  possesses  (does  not 
possess)  the  set  of  properties  Y".  Binary 
predicate  V  =>2  W  -  "the  subobject  V  causes  the 
set  of  properties  W"  is  substituted  by  2  ternary 
predicates:  Tpr(V,Z,W)  -  "the  subobject  V  under 
the  circumstances  (promoting  factors)  from  the 
set  Z  causes  (does  not  cause)  the  set  of  properties 
W"  -  and  Td(V,Z,W)  -  "the  subobject  V,  if 
circumstances  (deterrent  factors)  from  the  set  Z 
are  absent,  causes  (does  not  cause)  the  set  of 
properties  W". 

There  is  not  enough  place  here  to  present  the 
complete  description  of  formula  for  predicates 

Mapr,n  (V,Z,W)  and  M^  n  (V,Z,W).  We  can 


only  say,  that  first  one  describes  a  set  of  examples 
(this  set  being  a  base  for  plausible  inference), 
expresses  the  exaustability  condition  -  demand  to 
consider  all  appropriate  examples  from  initial 
DB.    Empirical    regularity,    that  describes 

predicted  causal  relation  in  M^pr  n  (V,Z,W)  , 

is  expressed  by  subformula 
VXVZVY(((7(I>n)F(X,Z,Y)&VY'VZ/(7(i,n)F(X,Z 
',Y')-*Z'cZ&Y'cY)&VcX)&(V,prcZv...vVrpr 
cZ))->W*0&WcY) 

("V  causes  set  of  properties  W  under  the 
condition  that  elements  V^,...,  Vrpr  from  the  set 
of  attendant  (promoting)  factors  Z  are  present"). 
Other      fragments      of      the  predicate 

M^pr  n  (V,Z,W)    describe  local  similarity  of 

the  situations,  where  promoting  factors  Z  act, 
exaustability  condition  -  taking  into  account  all 
these  situations,  -  condition  of  Vipr  minimality  (i 
=  1,...,  r)  and  condition,  that  their  set  is  unique. 

The  predicate  M^j  n(V,Z,W)  is  formulated 

analogously    to    generalized    predicate  of 

agreement  Mj5>n(V,X,W)  from  [8].  The  only 

difference  is  in  sorts  of  elements  from  Z  and  X. 
The  elements  of  the  set  of  obstacles  X  from 

Mj^  n(V,X,W)  are  subobjects  (from  2U<1>) 

whereas  the  elements  of  the  set  of  attendant 

factors    Z    from    M^n(V,Z,W)  represent 

U<s> 

external  circumstances  (from  2  ),  under  which 
causal  relation  exists.  Empirical  regularity,  that 
describes     predicted     causal     relation  in 

j  n  (V,Z,W),  is  expressed  by  subformula 

VXVZVY(((7(1,n)F(X,Z,Y)&VY'VZ,(7(i,n)F(X,Z 
',Y')->Z'cZ&Y'cY)&VcX)&-,(V  ,prcZv.. .  vV 
rprcZ))->W*0&WcY) 

("V  causes  set  of  properties  W  under  the 
condition  that  elements  Vipr,...,  Vrpr  from  the  set 
of  attendant  (deterrent)  factors  Z  are  absent"). 
It  is  obvious,  that  context  consideration  is 
essential  in  investigation  of  individual 
(collective)  behaviour.  The  proposed  approach  to 
context  consideration  does  not,  of  cause,  rule  out 
the  necessity  to  formalize  "situation"  concept  and 
to  investigate  "situation  =>  behaviour"  relation, 
the  latter  problems  being  the  problems  of 
macrosociology. 

5.  PRESENT  STATUS 

We  have  just  begun  to  study  the  particular  case 
of  behaviour  -  predisposition  to  one  or  another 
social  identification  and  solidary  actions.  We 


collaborate  with  the  group  of  researchers  from 
Institute  of  sociology,  Russian  Academy  of 
Sciences  -  E.N.Danilova,  S.G.Klimova, 
O.N.Dudchenko  -  headed  by  Director  of  Institute 
professor  V.A.Yadov.  First  results  of  JSM- 
analysis  of  solidary  behaviour  have  been 
received.  They  are  in  accordance  with  theoretical 
models  of  sociological  investigation.  The  results 
give  the  possibility  to  represent  the  totality  of 
social  actions'  determinants  reflecting  different 
types  of  consciousness. 
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ABSTRACT 

A  survey  of  results  concerning  algorithmic  complexity  of 
learning  in  the  version  space  formed  by  Galois  (concept)  lat- 
tices is  given.  The  results  are  related  to  some  work  in  machine 
learning  and  abductive  reasoning. 

KEYWORDS:  concept  lattices,  machine  learning,  classifica- 
tion, abductive  reasoning 

1.  INTRODUCTION 

Recent  studies  [12],  [3]  show  that  formal  concept  analysis 
based  on  Galois  (or  concept)  lattices  is  a  good  means  for 
learning  of  concepts.  However,  this  approach  faces  much 
computational  difficulties,  since  the  number  of  concepts  can  be 
exponential  in  the  number  of  examples.  Therefore,  the  prob- 
lem of  computational  complexity  is  crucial  here.  Algorithm 
reported  in  [3]  is  quadratic  in  the  number  of  concepts  gener- 
ated. In  this  paper,  we  present  some  results  concerning  the 
complexity  of  generating  the  set  of  all  hypotheses  in  the  ver- 
sion space  formed  by  Galois  lattices.  We  describe  an  algorithm 
which  is  linear  in  the  number  of  hypotheses  (and,  therefore,  in 
the  number  of  concepts),  and  can  work  both  in  batch  and  in- 
cremental modes.  Unlike  the  model  of  concept  clustering 
method  from  [3],  the  JSM-method  from  [4,  5]  uses  both  posi- 
tive and  negative  examples.  This  allows  us  to  relate  the  model 
presented  to  a  generalization  of  version  spaces  [8]  known  as 
disjunctive  version  space  approach  [10],  [9].  We  discuss  some 
extensions  of  the  approach  to  more  general  data  structures 
(e.g.,  to  graphs). 

2.  MAIN  DEFINITIONS:  HYPOTHESES  AND  FORE- 
CASTS (CLASSIFICATIONS) 

The  model  presented  here  was  first  proposed  in  [4]  as  JSM- 
method  of  automatic  hypothesis  generation  and  described  in 
more  detail  in  [5]. 

Let     be  a  set  of  positive  examples  and  E  be  a  set  of 
negative  examples  of  a  class  (defined  by  a  property  W,  in  the 
sequel,  we  do  not  mention  the  name  of  the  property  when  pos- 
sible), and  U  be  a  set  of  attributes,  t  and  f  be  relations  defined 
on  E* xf/and  E  xU,  respectively.  The  triples  (E+.U.t)  and  (E 
,U,r)  are  called  positive  and  negative  contexts,  respectively. 


Definition  1.  Let  A  c  U,  B  c  E*.  The  Galois  connections  on 
the  set  E*xU  are  given  in  the  following  way: 
s(A)  ={e  e  E+\  et  a  for  all  ae  A), 
t(B)={ae  U  I  at  e  for  all  ee  B). 

As  in  [11],  instead  of  s(A)  and  t(B)  we  write  A'  and  B'.  We 
will  use  the  notation  A ' '  and  B"  as  abbreviations  for  t(s(A)) 
and  s(t(B)). 

A  pair  (A,B)  such  that  A '  =  B,  B  =  A '  is  called  a  (positive) 
concept,  A  is  called  the  intent,  and  B  is  called  the  extent  of  the 
concept  [11].  The  set  of  all  concepts  forms  a  lattice  (for  prop- 
erly defined  sup  and  inf  operations)  called  Galois  lattice  [3]  or 
concept  lattice  [11]. 

Definition  2.  Let  (E*,U,t)  and  (E,U,r)be  positive  and  nega- 
tive contexts.  A  positive  concept  (H,H'),  Hcz  U  is  a  positive 
hypothesis  generated  for  these  contexts  if  H  c  e  for  no  nega- 
tive example  ee  E. 

Analogs  of  this  definition  are  used  in  various  paradigms  of 
machine  learning.  It  says,  that  a  hypothesis  (concept,  descrip- 
tion, etc.)  should  cover  some  positive  examples  and  should  not 
cover  any  negative  example. 

A  positive  hypothesis  (H,H')  is  called  minimal  by  inclusion  if 
there  is  no  positive  hypothesis 

(Hi  H, ')  such  that  Hi  qH  (minimality  of  a  hypothesis  means 
that  it  is  most  general  among  the  set  of  all  hypotheses). 
Galois  connections  for  negative  examples  and  (minimal)  nega- 
tive hypotheses  are  defined  similarly. 

The  following  definition  gives  the  rule  of  forecast 
(classification). 

Definition  3.  Let  (E',  U,r)  and  (E,  U,I~)  be  positive  and  nega- 
tive contexts,  and  Q  be  an  object  that  belongs  neither  to  E*  nor 
to  E  (i.e.,  we  do  not  know  whether  it  has  or  does  not  have  the 
property  W).  A  positive  hypothesis  (H+,H+')  is  called  a  hy- 
pothesis in  favor  of  Q  if  H*czQ.  A  negative  hypothesis  (H.  ,H. 
')  is  called  a  hypothesis  against  Q  'xiH.czQ  is  called  a  posi- 
tive forecast  if  there  is  a  positive  hypothesis  in  favor  of  it  and 
there  is  no  negative  hypothesis  against  it.  Negative  forecasts 
are  defined  dually. 
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The  relation  "be  in  favor  of  corresponds  to  the  relation 
"cover"  (when  the  talk  is  about  descriptions  covering  exam- 
ples). We  avoid  using  this  word  here,  since  "covering"  has  the 
opposite  order-theoretic  connotation. 

3.  COMPLEXITY  RESULTS 

3.1.  Theoretical  Limitations 

It  is  easily  shown  that  the  number  of  hypotheses  can  be  ex- 
ponential. Moreover,  this  number  is  hard  to  compute  and  esti- 
mate (which  could  have  been  useful  for  resources  allocation). 
The  following  result  is  a  testimony  to  this  statement. 

Theorem  1.  The  problem  of  determining  the  number  of  all 
positive  (negative)  hypotheses  is  #P-complete. 

Theorem  2.  The  problem  of  determining  the  number  of  all 
positive  (negative)  hypotheses  minimal  by  inclusion  is  #P- 
complete. 

Theorem  3.  The  following  problem  is  NP-complete: 
INSTANCE  Positive  context  (E*,U,f),  negative  context 
(E-,U,n,  query  Q. 

QUESTION  Does  there  exist  a  positive  hypothesis  in  favor  of 

w 

The  theorem  is  proved  by  considering  a  combinatorial  inter- 
pretation of  the  problem:  a  problem  related  to  a  quadripartite 
graphs  [6]. 

The  following  particular  cases  of  the  problem  from  Theorem  4 
where  polynomial-time  algorithms  are  possible  were  indicated 
in  [6]: 

•  The  set  of  negative  examples  is  empty, 

•  The  number  of  attributes  from  U  that  correspond  to  the 
query  Q  (i.e.,  \Q\)  is  bounded  from  above  by  a  constant. 
This  case  is  very  important  for  applications,  for  example, 
in  computer  drug  design,  where  the  size  of  a  chemical 
compound  is  limited,  but  the  number  of  compounds  in- 
volved as  positive  and  negative  examples  is  large.  The  al- 
gorithm used  in  this  case  is  linear  w.r.t.  the  number  of 
positive  hypotheses  in  favor  the  query. 

In  the  following  table,  we  present  results  from  [6]  about  the 
complexity  of  decision  problems  concerning  hypotheses  with 
restrictions  on  the  sizes  of  their  intents  and  extents. 


< 

> 

Iff! 

NP 

P 

Iff'l 

P 

NP 

Iffl+lff'l 

NP 

NP 

As  above,  \H\  and  \H'\  denote  the  sizes  of  intents  and  the 
extents  of  hypotheses,  respectively;  P  denotes  that  there  exists 
a  polynomial  algorithm  for  solving  the  problem,  NP  denotes 
NP-completeness  of  the  problem.  For  instance,  the  upper  left 


element  of  the  table  means  that  the  problem  "does  there  exist  a 
hypothesis  such  that  lffl<  kV  (k  is  a  parameter)  can  be  solved 
by  a  polynomial  time  algorithm.  The  element  in  the  bottom 
line  and  right  column  is  indicative  of  the  fact  that  the  problem 
"does  there  exist  a  hypothesis  such  that  Iffl  +  lff 'l>  k?"  is  NP- 
complete. 

3.2.  Close-by-One  (CbO)  Algorithm  for  Computing  Hy- 
potheses and  Its  Complexity 

We  assume  that  all  objects  from  E*  are  numbered,  and  so  a 
set  X  qE*  can  be  represented  by  a  respectively  ordered  tuple. 
The  numbering  of  objects  from  E+  induces  lexicographic  or- 
dering of  sets  from  the  powerset  of  E*.  For  the  sake  of  con- 
venience, we  can  represent  the  process  of  computing  hypothe- 
ses as  a  top-down  one,  which  generates  some  tree  whose  verti- 
ces correspond  to  hypotheses.  During  this  process,  the  exam- 
ples from  E*  can  be  labeled  or  remain  unlabeled  in  each  ver- 
tex independently.  The  following  algorithm  is  based  upon  the 
depth-first  strategy,  though  other  strategies  are  possible  as 
well.  Y  denotes  the  extent  of  a  current  hypothesis. 

Close-by-One  (CbO)  Algorithm 

Step  0.  There  is  only  one  root  vertex  where  all  examples  are 
unlabeled,  Y:=  0. 

Step  1.  The  current  vertex  corresponds  to  the  concept  with 
the  extent  Y.  The  first  unlabeled  element  of  E*,  say  X, ,  is 
taken,  (Yu/X,/)'and  (Y<j{Xj))"aie  computed.  A  new  vertex 
that  corresponds  to  (Tu/X,  })  "is  generated  and  connected  by 
an  edge  to  the  vertex  associated  with  Y. 

Step  2.  Check  conditions  (a)  and  (b): 

(a)  (Y^j{X,  J)  "contains  objects  with  numbers  less  than  those  of 
the  objects  from  Y  or  the  number  of  X„  (i.e.,  the  concept  with 
the  extent  (yu/X,  /)"has  already  been  generated). 

(b)  (Tu/X, /J 'is  contained  as  a  subset  in  a  negative  example 
from  E 

If  either  of  these  cases  holds,  all  elements  of  E*  are  labeled  at 
the  vertex  (Ku/X,  })"(thus,  the  branch  will  not  be  extended).  If 
neither  of  conditions  (a)  and  (b)  holds,  we  label  additionally 
the  element  X,  at  the  vertex  Y  and  all  elements  of  (Y(j{Xj })  "at 

the  vertex  (Ku/X  })"■ 

Step  3.  If  all  elements  of  E*  are  labeled  at  (Ku/X,  }Y\  we  go 
to  Step  4.  Otherwise, 
Y:  =  (YvfXi))",  we  return  to  Step  1. 

Step  4.  We  backtrack  the  tree  upwards  to  the  nearest  vertex 
with  unlabeled  elements  of  E*.  If  such  a  vertex  exists  and  cor- 
responds, say,  to  the  object  Z,  then  Y:=  Z  and  we  have  to  go 
to  step  1 .  If  there  is  no  such  vertex,  this  means  that  all  concepts 
have  been  generated  and  the  algorithm  halts. 

Theorem  4.  Let  (E*,V,t)  and  (E,UJ)  be  positive  and  nega- 
tive contexts,  respectively.  Then  the  set  of  all  positive  hy- 
potheses can  be  generated  in  time  0(\E+\\E\\U\K)  and  space 
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0((\E*\+\U \  +  \E\)K).  by  the  algorithm  CbO,  where  K  is  the 
number  of  resulting  hypotheses.  The  same  estimate  holds  for 
negative  examples. 

Thus,  the  resources  required  are  linear  w.r.t.  to  the  number 
of  hypotheses  generated,  which  is  better  than  results  reported 
in  [3],  where  an  algorithm  is  proposed  that  has  quadratic  com- 
plexity w.r.t.  to  the  number  of  hypotheses. 

Algorithm  CbO  can  work  in  purely  incremental  way  with- 
out recomputing  old  hypotheses:  new  examples  are  given  se- 
quential numbers  greater  than  those  of  the  existing  examples 
and  the  generation  of  hypotheses  continues.  The  principle  of 
canonical  ordering  of  hypotheses  used  in  the  algorithm  enables 
the  parallelization  of  the  algorithm:  all  hypotheses  are  gener- 
ated with  the  use  of  local  operations  and  tests,  which  does  not 
require  the  interaction  of  processors,  the  use  of  a  supervisor, 
and  shared  memory. 

The  CbO  algorithm  can  be  applied  to  more  general  type  of 
data  than  contexts  from  Definition  1,  e.g.,  in  the  case  where 
examples  are  represented  by  sets  of  graphs.  In  this  case,  we  do 
not  use  Galois  connections,  but  define  operation  G'  on  a  set  of 
graphs  G  as  the  set  of  all  maximal  by  inclusion  common  sub- 
graphs of  the  graphs  from  G.  Though  the  problem  of  comput- 
ing such  operation  is  NP-hard,  the  algorithm  remains  linear 
with  respect  to  the  number  of  hypotheses  generated.  More 
generally,  the  algorithm  can  be  used  for  arbitrary  data  that  al- 
lows the  definition  of  an  idempotent,  commutative  and  asso- 
ciative operation  n  (i.e.,  of  inf  type)  on  pairs  of  objects.  The 
expression.  (Yu{Xj})'in  the  formulation  of  the  CbO  algorithm 
is  replaced  by  Y,  U  .  Jl  Yk  U  Xh  where  { Y,  ...,Yk }  =  Y  and 
expression  (Y<u{Xj))"  is  replaced  by  the  operation  of  taking 
the  set  of  all  objects  S  that  contain  Z  (i.e.,  S  TiZ  -  Z)  In  the 
case  of  contexts  the  operation  II  corresponds  to  the  set- 
theoretic  intersection  n.  The  time  complexity  of  the  algorithm 
in  the  general  case  is  0(\E*\K)a  +  0(\E*\2 1  E  I  AT)p\  where  m 
is  the  number  of  positive  examples,  K  is  the  number  of  re- 
sulting hypotheses,  a  is  the  complexity  of  computing  operation 
II,  P  is  the  complexity  of  testing  the  order  relation  corre- 
sponding to  the  operation  11. 

4.  RELATION  TO  OTHER  WORK  AND  DISCUSSION 

As  mentioned  above,  our  algorithm  constructs  Galois  or 
concept  lattices  as  their  version  spaces.  Machine  learning 
method  reported  in  [3]  uses  the  same  version  spaces,  but  the 
algorithms  used  therein  are  less  efficient  and  do  not  take  into 
account  negative  examples  (concept  is  a  particular  case  of  a 
hypothesis  when  E  =  0).  The  version  space  used  in  this  work 
is  similar  to  the  disjunction  version  space  proposed  in  [9,  10] 
as  the  extension  of  Mitchell's  version  spaces  [8]:  examples 
must  be  covered  by  a  disjunction  of  concepts,  not  by  a  single 
concept  as  in  the  work  by  Mitchell.  Hypotheses  defined  in  this 
paper  correspond  to  the  set  S  of  the  lower  boundary  of  the 


disjunctive  version  space.  The  work  [10]  reports  on  the  quad- 
ratic algorithm  for  the  generation  of  S,  which  is  outperformed 
by  our  algorithm.  Definition  3  presents  a  way  of  plausible 
reasoning  similar  to  abduction  [2].  Moreover,  this  definition 
uses  a  method  for  generating  hypotheses  (Definition  2), 
whereas  in  abduction  models  they  appear  like  deus  ex  ma- 
china.  Computational  results  concerning  forecasts  are  similar 
both  in  pessimistic  (related  to  #P-completeness)  and  optimistic 
(linear  complexity  in  the  number  of  hypotheses)  aspects  to 
those  obtained  in  [2].  A  more  close  look  on  the  connection  of 
JSM-method  with  the  formal  concept  analysis  along  the  lines 
of  [1 1]  is  presented  in  [7]. 

The  CbO  Algorithm  was  used  in  an  applied  system  for  com- 
puter drug  design  [1].  By  means  of  this  algorithm  the  system 
detects  structural  causes  of  biological  acitivity  and  classifies 
unstudied  compounds  w.r.t.  biological  activity  under  study. 
Chemical  compounds  can  be  presented  both  in  descriptor  form 
(as  sets  of  descriptors)  and  as  molecular  graphs.  Though  the 
latter  case  is  more  preferable  from  the  point  of  view  of  appli- 
cations (better  forecasts  are  obtained),  computing  all  hypothe- 
ses on  average  Pentium  PC  computer  is  feasible  only  for  small 
samples  (20-30  examples),  as  it  was  in  the  case  of  octane  se- 
ries [1].  The  use  of  CbO  algorithm  drastically  reduced  the 
computation  time  (60%  on  average)  as  compared  with  the  time 
needed  for  previous  algorithms  used  in  the  system,  which,  like 
those  in  [3]  and  [10]  had  quadratic  time  complexity  in  the 
number  of  hypotheses. 
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ABSTRACT 

A  new  approach  to  full-text  processing  is  presented.  The  discussed 
text-processing  technique  is  based  on  the  logical  formalism  of  quasi- 
axiomatic  theories  (QAT).  A  core  element  of  the  described  approach 
is  in  use  of  original  machine  learning  technique  -  so-called  JSM- 
method  of  automated  hypotheses  formation.  The  described  QAT- 
based  technique  is  used  as  a  tool  for  analysis  of  similarities  in  con- 
tents of  full-text  documents.  Finally,  possible  applications  of  the 
presented  tools  as  a  practical  instrument  for  automated  message  un- 
derstanding are  discussed. 

KEYWORDS  J  quasi-axiomatic  theories,  plausible  reasoning, 
machine  learning,  automated  document  understan  ding. 

1.  INTRODUCTION 

To  design  a  semiotic  model  of  open  system  in  many  cases 
means  to  provide  an  ability  to  process  data  of  different  types 
describing  open  (unclosed,  partially  described,  etc.)  subject 
area.  In  more  details  it  means  to  support  the  following  two 
characteristic  features: 

•  to  provide  the  ability  (for  the  system  controller  and  for  the 
model  under  design)  to  learn  by  experience  (e.g.,  by  exam- 
ples^); 

•  to  provide  the  ability  to  process  (by  "unified"  data  OLAP2 
and  decision  making  instruments  applied  in  a  "unified"  de- 
cision making  environment)  data  of  different  types  (e.g., 
symbolic  and  numeric,  structured  and  unstructured,  textual 
and  graphical,  etc.). 

Our  approach  to  design  the  presented  type  mechanisms 
of  modeling  is  based  [1]  on  the  so  called  Quasi  Axiomatic 
Theories  (QAT).  Every  QAT  may  be  characterized  (for  more 
details  see,  for  example  [1]  and  [4])  by  a  tuple 

r  =  <z,z\R> 

where 

•  Z  is  an  axiom  set  (describing  a  "current  state"  of  the  sub- 
ject area  under  investigation  and  the  controlled  system. 


1  I.e.  to  base  current  decision  making  process  on  the  experi- 
ence of  both  "good"  (i.e.  "positive")  and  "bad"  (i.e. 
"negative")  previously  made  decisions. 

2  OLAP  -  OnLine  Analytical  Processing. 


Roughly  speaking  it's  a  set  of  "empirical  dependencies"  - 
invariant  lows3  -  describing  a  situation  under  analysis); 

•  Z'  is  a  set  of  empirical  statements  (i.e.,  elementary  "facts") 
characterizing  empirical  knowledge  (e.g.,  examples  of  pre- 
viously made  decision,  etc.); 

•  R  is  a  set  of  reasoning  rules  consisting  of  two  parts 

R  =  Rr  u  Rp 

where 

-  Rr  is  a  set  of  reliable  inference  rules  (e.g.,  rules  of  de- 
ductive inference); 

-  Rp  is  a  set  of  plausible  reasoning  rules  (e.g.,  formalized 
variants  of  reasoning  by  analogy  rules,  "common  sense" 
reasoning  rules,  rules  of  abductive  reasoning,  etc.). 

From  our  viewpoint  the  challenge  is  to  provide  the  designed 
semiotic  model  of  the  analyzed  open  system  by  the  following 
two  features: 

•  to  be  able  to  learn  (by  use  of  reasoning  rules  from 

R  =  Rr  u  Rp)  new  empirical  dependencies  from 
"examples"  (i.e.  from  elements  of  Z')  and 

•  to  be  able  to  use  this  new  dependencies  (i.e.,  procedural 
extension  of  Z  processed  from  Z'  by  R)  in  "understanding" 
and  "processing"  of  new  situations  (i.e.  control  activities  in 
new  situations)  for  open  system  under  analysis. 

2.  AN  APPROACH. 

We  start  from  the  methodological  hypothesis  characterizing  a 
formal  model  of  natural  language  as  a  set  of  local  dependen- 
cies (i.e.  grammatical  rules,  axioms)  and  a  set  of  exclusions 
(i.e.  facts,  that  can  not  be  described  by  the  mentioned  axioms). 
Our  aim  is  to  learn  by  examples  of  facts  and  to  extend  (where 
it's  possible)  some  known  rules  on  new  cases  (new  facts).  We 
use  the  JSM-method  (see  [1])  as  the  tool  of  machine  learning 
and  reasoning  formalization  realized  by  means  of  QAT.  This 
technology  may  be  introduced  as  a  methodological  platform 
for  machine  intelligence  systems  that  implement  a  synthesis  of 
cognitive  procedures.  The  technology  is  based  on  the  original 
and  sophisticated  mathematical  formalism  of  inductive  learn- 
ing. The  described  mathematical  formalism  provides  an  effec- 
tive co-operation  of  the  following  reasoning  techniques: 


3  Providing  only  partial  (i.e.  -  uncomplete)  description  of  the 
situation  under  analysis. 
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•  constructive  formalization  of  abductive  reasoning  (in  the 
sense  of  C.S.Peirce); 

•  constructive  formalization  of  inductive  reasoning  (in  the 
sense  of  J. S. Mill); 

•  automated  deduction; 

•  formalized  reasoning  by  analogy. 

The  presented  tools  are  oriented  on  the  processing  of  large 
volumes  of  information  in  full-text  data  bases.  The  discussed 
tools  provide  sufficiently  "flexible"  and  effective  search  of 
documents  that  are  "relevant"  to  fixed  themes.  Using  this  type 
tools  it's  possible  to  implement  NL-text-processing  based  on 
"understanding"  documents  by  computer  (this  "understanding" 
is  provided  by  original  instruments  that  model  NL-text  seman- 
tics by  special  syntactic  structures). 

The  key  characteristic  of  our  text-processing  tools  is  in  auto- 
matic (i.e.  implemented  by  computer)  "understanding"  of  a 
target  set  of  full-text  documents.  This  "understanding"  is  pro- 
vided in  the  process  of  co-analysis  of  the  target  texts  and 
automatically  extracted  from  the  semantic  structures  (so-called 
concepts).  The  discussed  concepts  are  designed  in  the  process 
of  machine  learning  by  analyzed  texts.  We  use  Quasi- 
Axiomatic  Theories  (QAT)  as  an  original  methodological  plat- 
form for  automated  text  understanding:  the  techniques  used  for 
machine  learning  is  formalized  by  means  of  QAT. 

Basic  elements  of  our  approach  are  the  following: 

•  automatic  extraction  and  analysis  of  the  "semantic  similar- 
ity" classes  for  processed  documents; 

•  an  ability  to  use,  in  the  process  of  machine  learning,  both 
positive  and  negative  text-precedents; 

•  a  flexible  structure  of  logical  inference  rules  for  document 
classification  (i.e.  for  similarity  classes  generation); 

•  an  ability  to  combine  logical  and  statistical  methods  of  data 
analysis. 

The  most  important  features  that  differs  our  approach  from 
other  solutions  used  in  the  discussed  problem  area  may  be 
characterized  by  the  following: 

•  we  are  able  to  provide  effective  machine  learning  from 
small  samples  of  texts-precedents; 

•  we  are  able  to  process  automated  text  "understanding" 
based  on  both  positive  (i.e.  examples  of  texts  that  are  rele- 
vant to  the  analyzed  theme)  and  negative  information  (i.e. 
examples  of  texts  that  are  not  relevant  to  the  analyzed 
theme); 

•  we  are  able  to  design  non-monotonic  (i.e.  dynamically  re- 
structured in  accordance  with  the  extension  of  text  corpora 
under  analysis)  structure  of  the  "semantic  similarity" 
classes  for  processed  documents. 

3.  APPLICATIONS 


We  suppose  that  QAT-based  techniques  may  be  used  as  a 
background  for  a  qualitative  break  and  effective  spurt  in  the 
area  of  automated  text  understanding.  JSM-style  document 
processing  tools  can  provide  NL-document  processing  which 
realize  an  automated  document  clustering  (e.g.,  "natural  kind"4 
clustering  of  theme  vectors  extracted  from  documents  by  lin- 
guistic analysis  tools  of  the  ORACLE  ConText  type  [2]).  In 
opposite  to  the  approach  based  on  the  analysis  of  statistical 
correlations5  (and  realized,  for  example,  in  PATHFINDER 
[3])  the  JSM-style  document  processing  and  "understanding" 
technique  can  provide  an  exhaustive  deterministic  analysis  and 
"context-dependent"  computational  document  understanding 

We  suppose  that  the  most  important  applications  of  our  theo- 
retical approach  and  software  tools  might  be  designed  in  the 
following  problems: 

•  the  improvement  of  effective  and  rapid  information 
searches  on  the  INTERNET; 

•  automated  message  understanding; 

•  automated  classification  of  full-text  documents; 

•  improvement  of  the  text-processing  effectiveness  for  stan- 
dard document  management  systems; 

•  improvement  (based  on  the  use  of  machine  learning  tech- 
nique) of  effectiveness  for  information  search  tools  in  stan- 
dard text  based  information  retrieval  systems; 

•  an  analysis  of  "semantic  analogy"  for  sets  of  full-text 
documents  (e.g.,  an  analysis  of  how  closely  connected,  in 
the  sense  of  their  content,  two  sets  of  texts  are); 

•  automatic  diagnostics  for  the  existence  of  "many  aspects" 
(multi-dimensional  semantics)  in  the  content  of  the  text 
corpora  under  analysis; 

•  automatic  diagnostics  of  a  "new  content"  occurrence 
caused  by  an  extension  of  analyzed  text  corpora  by  new 
elements  (i.e.  by  new  texts); 

and  some  other  fields,  characterizing  applications  of  intelligent 
information  systems  (see  for  example  [5]). 

4.  REFERENCES 

[1]  Finn  V.  K.  "Plausible  inferences  and  plausible  reasoning". 
-  Journ.  Soviet  Mathematics.  Plenum  Publ.  Corp.,  vol.  56,  N  1, 
1991. 

[2]  "CONTEXT:  Introduction  to  Oracle  ConTexf -  ORACLE 
White  Paper.-  ORACLE  Corp.,  September,  1993.-  Pp.1-17. 


4  I.e.  clustering  based  on  machine  learning  by  examples  of 
documents  that  relevant  to  the  processed  query  (i.e.,  by  so 
called  positive  examples)  and  by  examples  of  documents  that 
are  not  relevant  to  the  processed  query  (i.e.,  by  so  called  nega- 
tive examples). 

5  More  correctly:  it  is  based  on  the  analysis  of  "distances" 
between  "lexical  units"  -  e.g.,  words,  themes,  ...  -  and  on  the 
analysis  of  there  "coexistence"  in  analyzed  documents. 


101 


[3]  "PATHFINDER  (Version  7.0)".-  User's  Guide.  -  Pre-  August  1995,  Monterey,  CA.  -  AdRem,  Inc.,  1995.  -  Pp.  99- 
search  Inc.,  Fairfax,  VA.  -  1994.  108. 


[4]  Zabezhailo  M.I.  et  al.  "Reasoning  Models  for  Decision 
Making:  Applications  of  JSM-Method  for  Intelligent  Control 
Systems".  -  Architectures  for  Semiotic  Modeling  and  Situa 


[5]  Zabezhailo  M.I.,  Finn  V.K.  'Intelligent  information  sys- 
tems". -  International  Forum  on  Information  and  Documenta- 
tion (FID,  The  Hague,  Netherlands).-  1996.  -  Vol.21.  -  N2.  - 
Pp.2 1-31 


Hon  Analysis  in  Large  Complex  Systems.-  Proc.  of  the  Work- 
shop of  10th  (1995)  IEEE  Symp.  on  Intelligent  Control 
(Eds.:  J.Albus,  A.Meystel,  D.Pospelov,  T.Reader).  -  27  -29 


102 


NonMonotonic  Reasoning  in  the  Modal  Quantificational  Logic  Z 


Frank  M.  Brown 
Artificial  Intelligence  Laboratory 
University  of  Kansas 
Lawrence,  Kansas,  66045 
brown@eecs.ukans.edu 


Abstract 

Nonmonotonic  reasoning  can  be  recursively 
axiomatized  as  a  formal  axiomatic  theory  which  is 
monotonic.  This  paper  describes  how  this  is  done 
beginning  with  an  axiomatization  of  logical  truth  with 
the  modal  quantificational  logic  Z  which  builds  on 
previous  works  of  Leibniz,  Carnap,  Prior,  and 
especially  Bressan.  It  explains  how  to  form  default 
statements  in  terms  of  modal  possibility  with  respect 
to  a  theory,  how  to  reduce  possibility  with  respect  to 
a  theory  to  logical  possibility,  how  to  determine  when 
something  is  logically  possible,  how  to  form 
reflective  equations  where  the  default  is  formed  using 
the  theory  containing  the  default  itself,  how  to  solve 
various  types  of  reflective  equations  in 
quantificational  logic  to  obtain  their  "fixed  points", 
and  how  to  deal  with  multiple  solutions  by  taking 
their  commonalty.  In  a  broader  sense  this  work 
supports  Lewis's  original  claim  that  modal  logic  is  a 
missing  part  of  classical  logic  in  that  it  illustrates  how 
classical  logic  extended  with  modal  logic  accounts 
for  nonmonotonic  reasoning. 

1.  Introduction 

NonMonotonic  Reasoning  is  a  form  of  deductive 
reasoning  that  allows  general  laws  to  be  asserted  and 
general  conclusions  to  be  drawn  from  them  without 
contradiction  in  the  presence  of  numerous  exceptions. 
For  example,  in  a  nonmonotonic  system  one  may 
assert  the  general  law  that  birds  fly  even  though  we 
also  know  that  penguins  are  birds  which  do  not  fly. 
In  a  nonmonotonic  system  the  assumption  that  there 
are  such  penguins  will  not  be  contradictory.  This  is 
achieved  by  representing  general  laws  as  defaults 
rather  than  merely  as  universally  quantified 
implications.  Thus  instead  of  writing  "birds  fly"  as 
for  all  x  if  x  is  a  bird  then  x  flies: 

Vx((bird  x)  -»(fly  x)) 
it  is  written  as  for  all  x  if  x  is  a  bird  and  if  it  is 
logically  possible  with  respect  to  a  theory  T  for  x  to 
fly  then  x  flies: 

(Vx((bird  xWoCrWflv  xV))^>ffly  x)))) 
The  underlined  phrase  speaks  of  logical  possibility 
thereby  implying  a  reduction  of  nonmonotonic 
reasoning  to  modal  logic  There  are  of  course  many 
different  modal  logics,  but  there  is  one  that  is 
particularly  relevant,  namely  the  modal 
quantificational  logic  Z  [Brown90]  whose  possibility 


operator  is  analogous  to  the  semantical  concept  of 
satisfiability  in  classical  first  order  logic.  For  all 
sentences  T  of  first  order  logic, 

(Satisfiable  (kwote  Y))  if  and  only  if  (<>  T) 
where  (kwote  V)  is  the  name  of  T  in  the  metatheory. 
The  remainder  of  this  paper  describes  nonmonotonic 
reasoning  in  terms  of  the  Z  modal  logic.  Z  is 
axiomatized  in  Section  2.  Proof  techniques  for 
determining  what  is  logically  possible  are  given  in 
Section  3.  Section  4  introduces  reflective  equations 
and  describes  an  approach  to  solving  them.  Section  5 
describes  the  propositional  quantifier  disjunction 
idiom,  which  is  used  to  represent  the  disjunction  of 
solutions  to  a  reflective  equation.  Some  conclusions 
are  drawn  in  section  6. 

2.  Axiomatization  of  Z  Modal  Logic 

The  modal  quantificational  logic  Z  is  an  eight  tuple: 
(— >,  #f,  V,  =,  [],  vars,  predicates,  functions)  where 
— >,  #f,  V,  =,  and  []  are  logical  symbols,  vars  is  a  set 
of  variable  symbols,  predicates  is  a  finite  set  of 
predicate  symbols  each  of  which  has  an  implicit  arity 
specifying  the  number  of  terms  to  be  associated  with 
that  predicate,  and  functions  is  a  finite  set  of  function 
symbols  each  of  which  has  an  implicit  arity 
specifying  the  number  of  terms  to  be  associated  with 
that  function.  The  set  of  logical  symbols,  the  set  of 
variables,  and  the  set  of  predicate  and  function 
symbols  are  pairwise  disjoint.  The  set  of  terms  is  the 
smallest  set  which  includes  the  variables  and  is 
closed  under  the  process  of  forming  new  terms  from 
other  terms  using  the  functions  symbols  of  the 
language.  The  set  of  sentences  is  the  smallest  set 
which  includes  #f,  the  variables,  and  each  of  the 
predicates  followed  by  an  appropriate  number  of 
terms,  and  is  closed  under  the  process  of  forming  new 
sentences  from  other  sentences  using  the  logical 
symbols  of  the  language,  provided  that  no  variable  in 
any  subexpression  of  a  sentence  has  free  occurrences 
both  as  a  sentence  and  as  a  term.  Variables  which 
occur  only  in  term  positions  are  called  object 
variables.  Variables  which  occur  only  in  sentence 
positions  are  called  propositional  variables.  Roman 
letters  possibly  indexed  with  digits  are  used  as 
variables  of  Z;  for  example:  p,  q,  r,  w,  v,  xi...xn, 
yi...yn.  Greek  letters  are  used  as  syntactic 
metavariables,  y  ranges  over  the  variables,  k,  7ii...7tn 
range  over  the  predicate  symbols,  °°,  °°i...°°n  range 
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over  function  symbols,  8,  5i...8n  range  over  terms, 
and  a,  ai...an,  P,  Pl-Pn.  r,  and  Ti  ...rn  range  over 
sentences.  Thus,  the  terms  are  of  the  forms  y  and  (°° 
81. ..6n),  and  the  sentences  are  of  the  forms  (a  — >P), 
#f,  (Vy  a),  (Qa);  (tc  8  i...8n),  (5i=82),  and  y.  A 
nullary  predicate  n  or  function  °°  is  written  as  a 
sentence  or  term  without  parentheses,  i.e.,  k  instead 
of  (ji)  and  °°  instead  of  («>). 

The  primitive  symbols  of  Z  are  shown  in  figure  1 


Symbol 

Meaning 

a— >  P 

if  a  then  3- 

#f 

falsity 

Vy  a 

for  all  y,  a. 

81=82 

81  is  (contingently)  equal  to  82. 

n  a 

a  is  logically  necessary 

Figure  1:  Primitive  Symbols  of  Z 

The  defined  symbols  of  Z  are  listed  in  figure  2  below 


Symbol 

Definition 

Meaning 

—.a 

a->#f 

not  a 

#t 

-,  #f 

truth 

avP 

(-i  a)->  0 

a  or  P 

aA3 

^a-*.-.B) 

a  and  3 

cc<h>  P 

(a->  P)  a  (p-»  a) 

a  if  and  only 

if  3 

3y  a 

-iVy-ia 

some  p  is  a 

<>  a 

-.[]-.« 

a  is  logically 
possible 

a^p 

[](a^P) 

a  and  3  are 
synonymous 

[PI  a 

(ri(B->a)) 

3  entails  a 

<P>  a 

(<>(  Pacx)) 

a  is  possible 
in  3 

81[=]82 

(□<8i=«2)) 

81  necessarily 
equals  82 

(world  a) 

(<>a)A 

(Vy(([a]y)v([a]^y))) 
where  y  is  not  in  a 

a  is  a  world 

(gen  a) 

(v(3yn...ymi 
(a=  ("1  Yll-Yml)))) 
-  (3Yln-Ymn 
(as(7tn  Yln-Ymn))))) 
where  all  the 
predicates  are  7ii  ...7Tn 
and  the  arity  of  7ri  is 
mj  and  for  each  i  and 
j,  Tij  is  a  distinct 
variable  not  in  a 

a  is  a 
generator 

Figure  2:  Defined  Symbols  of  Z 

are  recursive.  The  classical  (i.e.,  non-modal)  axioms 
and  inference  rules  of  Z  include  those  of 
quantificational  logic  [Mendelson]  given  in  figure  3. 
The  laws  MR1,  MR2,  MAI,  MA2,  MA3,  MA4, 
MA5,  MA6,  and  MA7  are  a  standard  set  of  axioms 
and  inference  rules  for  first  order  quantificational 
logic  except  for  the  following  two  points:  First, 
because  y  in  laws  MR2  and  MA4  may  be  a 
propositional  variable  these  two  laws  constitute  an 
additional  fragment  of  2nd  order  logic  beyond  first 
order  logic.  Propositional  quantification  in  modal 
logics  has  been  investigated  by  [Fine  70].  A 
conceptual  interpretation  of  quantification  [Garson 
84]  is  at  the  heart  of  this  logic,  although  this  will  not 
be  noticed  until  one  deals  with  necessity,  because  all 
the  normal  laws  of  first  order  logic  hold  otherwise. 


MR1:  from  a  and  (a— >  P)  infer  p 
MR2:  from  a  infer  (Vy  a) 
MAl:a->(p->  a) 

MA2:  (a-»  ( p->  p))  ->  ((a->  P)-»  (a->  p)) 

MA3:  ((-n  a)->  (-n  P))->  (((-,  a)->  P)->a) 

MA4:  (Vya)->  P 

where  P  is  the  result  of  substituting  an 
expression  (which  is  free  for  the  free  positions 
of  y  in  a)  for  all  the  free  occurrences  of  y  in  a. 

MA5:((Vy(a^P))^  (a->(Vyp))) 

where  y  does  not  occur  in  a. 

MA6:  ((x=y)-> 

((Ai,  n  (On  zii...x...zmi)^(7ti  zi  j...y...zmi))) 
A(Ai,  n2((ooj  zii...x...zm0=(ooi  zii...y...zmi)))) 
where  7ti  ...7Cn  are  all  the  predicates 
and  °°\...°°r\2  are  all  the  functions. 

MA7:  x=x 

Figure  3:  The  Classical  Rules  and  Axioms  of  Z 


Z  is  effectively  axiomatized  with  a  recursively 
enumerable  set  of  theorems  as  the  set  of  axioms  is 
itself  recursively  enumerable  and  its  inference  rules 


The  modal  inference  rule  and  axioms  of  Z  which  are 
about  logical  necessity  (i.e.,  [])  are  given  in  figure  4. 
The  laws  R0,  Al,  A2  and  A3  constitute  an  S5  modal 
logic[Hughes&Cresswell  68]  which,  with  the 
nonmodal  laws,  is  similar  to  [Carnap46,Carnap56] 
and  a  first  order  logic  version  of  [Bressan  72].  A4 
says  that  a  proposition  is  logically  necessary  if  it  is 
entailed  by  every  world  proposition.  This  law  was 
implied  in  [Leibniz]  and  has  been  used  by  a  number 
of  authors  including  [Prior&Fine  77,  Fine  70].  Laws 
A5,  A6,  and  A7  axiomatize  the  predicates.  A5  is  the 
key  axiom  which  says  that  any  exhaustive 
conjunction  of  negated  or  unnegated  distinct 
generators  is  a  world,  provided  that  there  is  a 
sentence  a  expressible  in  the  formal  language  of  Z 
which  holds  when  p  is  an  unnegated  generator  of  that 
conjunction.  Like  the  A5  axiom  in  [Brown87]  it 
handles  quantifiers  over  arbitrary  domains.  The  A5 
axiom  is  far  stronger  than  the  trivial  modality  axioms 
such  as  3pq((-.[p]q)A(-i[p]-.q))  assumed  in  [Lewis 
36]  and  3p((op)A(<>— .p))  assumed  in  [Bressan  72]. 
It  also  extends  certain  axiom  schemas  used  in 
propositional  logic  S5c  [Hendry&Pokriefka  85]  and 
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propositional  logic  S13  [Cocchiarella].  Laws  A6  and 
A7  axiomatize  the  intensionality  of  predicate 
arguments  and  are  compatible  with  Bressan's 
[Bressan  72]  view  that  predicates  map  tuples  of 
concepts  into  propositions.  Laws  A8  and  A9 
axiomatize  the  functions  which  may  be  viewed  as 


(world(  Vp((3x3y(p=(loves  x  y))) 
v(3x(p=(happy  x)))) 
->(p<->[](p=(happy  John))))) 
By  S5  modal  logic  this  becomes: 
(world 

((VxVy((loves  x  y)  <->  ((loves  x  y)s(happy  John)))) 
a(Vx( (happy  x)<->((happy  x)=(happy  John)))))) 
By  Axioms  A6  and  A7  this  becomes: 
(world((Vx((loves  x  y)<-»  #f>) 

A(Vx((happy  x)f->(x[=]John))))) 
Since,  by  definition,  worlds  are  logically  possible  it 
follows  that: 

<>((Vx((loves  x  y)<^  #f)) 

A(Vx((happy  x)<-»(x[=]John)))) 
Since  (<>(oa(3))  implies  (<>(3)  in  S5  it  follows  that: 
<>Vx((happy  x)<->  (x[=]John) 

The  main  problem  with  using  axiom  scheme  A5  to 
prove  that  something  is  logically  possible  lies  in 
finding  the  appropriate  substitution  for  the  parameter 
a.  The  theorem  ZP1  given  below  is  the  basis  of  a 
heuristic  for  finding  such  instances.  Intuitively,  we 
know  that  the  conjunction  of  instances  of  a  predicate 
and  the  conjunction  of  instances  of  the  negation  of  a 
predicate  are  logically  possible  whenever  the  two  do 
not  coincide  on  any  instance.  For  example:  ((p  a)A(p 
b)A(-i(p  c)))  is  logically  possible  but  ((p  a)A(p 
b)A(-i(p  a)))  is  not.  Thus  if  a  sentence  can  be  written 
in  the  form:  (rA(Vx(a^(p  x)))a(Vx(P->(-.(p  x))))) 
where  p  does  not  occur  in  a,  (3,  and  T,  then  it  is 
logically  possible  if  and  only  if  {x:a}n{x:P)  is 
empty,  which  is  to  say  that  3x(ccaP)  does  not  follow 
from  T  or  to  say  that  V  and  — i3x(oa|3)  is  logically 
possible.  In  this  manner  determining  whether  a 
sentence  with  n  predicates  is  logically  possible  can 
sometimes  be  reduced  to  determining  whether  a 
sentence  with  n-1  predicates  is  logically  possible 
without  having  to  guess  any  instances  of  the  A5. 

Theorem  ZP1:  The  possibility  of  a  disjoint  predicate 
definition:  If  r  ,  a  ,  and  P  are  sentences  of  Z 
containing  no  unmodalized  occurrence  of  the 
predicate  n,  =,  or  a  propositional  variable,  k  is  of 
arity  n,  and  x=xl...xn  is  a  tuple  of  n  variables  then: 

(<>(rA(Vx(<x-»(rc  x)))a(Vx(P->^(tt  x))))) 

<-^(<>(rA(-i3x(aAp)))) 
Proof 

Assume:  n  is  not  in  T,  a,  nor  p  then 

(<>(rA(Vx(CC-K7T  x)))A(Vx(P-^-n(7T  x))))) 

<-K<>(rA(-dx(aAp)))) 
divides  into  two  cases: 
Case  1: 

(<>(rA(Vx(a->(7c  x)))a(Vx(P->-.(7c  *))))) 
^(<>(rA(-n3x(oAp)))) 

resolving  on  the  implications  with  n  and  — iTt  gives: 

(<>(rA(Vx((X^(7l  X)))A(VX(P-MJC  X)))) 

A(Vx(-.av-,p)))  -»(<>(rA(-ax(ciAp)))) 

which  may  be  rewritten  as: 


mappings  of  tuples  of  concepts  into  concepts.  

RO:  from  a  infer  ([]  a) 
Al:  ([]p)-»p 
A2:  ([p]q)  ->  (([]p)-»  (Dq)) 
A3:  ([]p)v([]-,[]p)) 

A4:  (Vw((WORLD  w)->([w]p))) ->  []p 
A5:  WORLD(Vp((GEN  p)-)(pH[]a))) 

for  every  expression  a. 
A6:  -,(=  (jci  xi„.xn)(JC2  yi-ym)) 

where  k\  and  K2  are  different  predicate  symbols. 
A7:  ((TCXi...xn)=(7c  yi.,.yn)) 

<->  ((xi[=]yi)A....A(xn[=]yn)) 
A8:  -.((*»!  xi...xn)[=](<~2  yi-Ym)) 

where  <»i  and  °°2  arc  different  function  symbols. 

A9:  ((°°  xi...xn)[=](°°  yi...yn)) 
<->  ((xi[=]yi)A..A(xn[=]yn)) 
Figure  4:  The  Modal  Rule  and  Axioms  of  Z 


Example:  A  parucular  Z  language  contains  the  binary 
predicate:  loves  and  the  unary  predicate:  happy,  the 
unary  function  father-of,  and  the  0-ary  function  John. 
The  definitions  and  axioms  depending  on  the  this 
language  are  as  follows.  First,  the  generators  are 
defined  to  be  those  propositions  beginning  with  the 
loves  and  happy  properties: 

(gen  a)  =df(3x3y(a=(loves  x  y)))v(3x(a=(happy  x))) 

The  equality  axiom  MA6  becomes: 

MA6:    x=y  ->  (((loves  x  z)<->(loves  y  z)) 
A((loves  zx)b  (loves  z  y)) 
A((happy  x)<-»  (happy  y)) 
A((father-of  x)=(father-of  y))) 

Using  the  above  definition  of  generators,  the  axiom 

scheme  A5  becomes: 

A5:       (world(Vp((3x3y(ps  (loves  x  y))) 
v(3x(p=  (happy  x)))) 
->(p<->[]ct))) 
for  every  expression  a. 

Finally,  axiom  schemas  A6,A7,A8,  and  A9  become: 
A6:       -.((loves  xl  x2)=(happy  yl)) 
A7a:      ((loves  xl  x2)=(loves  yl  y2)) 

^((xl[=]yl)A(x2[=]y2)) 
A7b      ((happy  xl)=(happy  yl))  xl[=]yl) 
A8:       -.((father-of  xl)[=ljohn) 
A9:       ((father-of  xl)[=](father-of  yl))<->  (xl[=]yl) 

3.  What  is  Logically  Possible? 

Axiom  scheme  A5,  which  has  a  recursively 
enumerable  number  of  instances,  expresses  what  is 
logically  possible  to  the  extent  that  it  can  be  so 
expressed.  In  the  previous  example,  instantiating  a 
in  A5  to  (p=(happy  John)  produces: 
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(o(rA(Vx(oc->(jc  x)))a(Vx(P-Mtc  x)))) 

a(-,(3x(oaP))))  -^(<>(rA(^3x(aAP)))) 
which  holds  by  the  laws  of  S5  modal  logic. 
Case  2: 

(o(TA(-n3x(OAP)))) 

-»(o(TA(Vx(a-»(jc  x)))a(Vx(P-*-.(jc  x))))) 
Using  axiom  A4  twice  we  get: 
(3  w((world  w)a([w]  (rA(-^3x(OAp)))))) 
— >(3w((world  w) 

A([w](rA(Vx(a->(7tx)))A(Vx(p-»-,(7Cx))))))) 
Let  (h  w)  represent  the  world  in  which  (it  x)  holds  if 
and  only  if  a  holds  in  w  and  every  other  generator 
holds  if  and  only  if  it  holds  in  w: 

(h  w):=  Vg(Gen  g)->(g<->((3x((gs(7c  x))a([w]cc))) 
v(K3x(g<rcx))))A([w]g)))) 
By  A5  (h  w)  is  a  world.  From  this  assignment  it 
follows  that:  ([(h  w)](Vx((7C  x)*->([w]a)))).  The  w  in 
the  conclusion  is  instantiated  to  (h  w)  giving: 
((world  w)A([w](rA(-3x(tXAP))))) 

->([(h  w)](Ta(Vx((x->(7C  x)))a(Vx(P^(tt  x))))) 
Since  ([(h  w)](Vx((tc  x)<->([w]a)))),  (it  x)  may  be 
replaced  by  ([w]a)  giving: 
((world  w)A([w](rA(-n3x(OAp))))) 

->([(h  w)](TA(Vx(a^([w]a)))A(Vx(P^([w]a))))) 
Pushing  -1  through  the  the  world  w  entailment  gives: 
((world  w)A([w](rA(-n3x(OAp))))) 
->([(h  w)](rA(Vx(a-^([w]a)))A(Vx(P->([w]-,a))))) 
The  hypothesis  may  be  rewritten  giving: 
((world  w)A([w]rA(Vx(p-*-.a)))) 
->([(h  w)](rA(Vx(a->([w]a)))A(Vx(P->([w]-,a))))) 
Reducing  the  scope  of  [w]  in  the  hypothesis  gives 
((world  w)A([w]r)A(Vx(([w]p)->([wHa)))) 
->([(h  w)](rA(Vx(a^([w]a)))A(Vx(p->([w]-na))))) 
Since  [w]-.a  is  implied  by  [w]P  if  it  is  entailed  by 
[w]p,  it  suffices  to  prove: 
((world  w)a([w]Da(Vx(([w]P)-K[w]-.cO))) 
->([(h  w)](rA(Vx(a^([w]a)))A(Vx(P->([w]P))))) 
Reducing  (h  w)  to  lower  scope  gives: 
((world  w)a([w]Oa(Vx(([w]P)-K[w]-iOO))) 
->(([(h  w)]r)A(Vx(([(h  w)]a)->([w]a))) 
A(Vx(([(h  w)]P)^([w]P)))) 
Since  it,  =,  nor  any  propositional  variable  occur 
unmodalized  in  F,  a,  or  p,  (h  w)  entails  these 
sentences  if  and  only  if  w  does,  thus  (h  w)  in  the 
conclusion  may  be  replaced  by  w  giving: 
((world  w)A([w]r) A(Vx(([w]p)-K[w] -«)))) 
-K([w]r)A(Vx(([w]a)-+([w]a))) 
a(Vx(([w]P)->([w]P)))) 
which  is  a  tautology.  QED. 

ZP1  is  applicable  to  any  theory  (such  as  propositional 
logic,  monadic  predicate  logic,  and  the  case  of  a  finite 
theory)  which  can  be  put  into  a  prenix  conjunctive 
normal  form  such  that  no  disjunct  contains  more  than 
one  unmodalized  occurrence  of  it,  since  by  the  laws 
of  classical  logic  such  theories  are  equivalent  to  a 
disjunction  of  expressions  of  the  form:  (Fa(Vx(o:— Ktt 
x)))a(Vx(P-Mtc  *))))• 


Example:  The  database  K  contains  the  following 
facts:  Tweety  and  Chilly-Willy  are  birds,  but  (being  a 
penguin)  Chilly-Willy  does  not  fly.  By  default  birds 
fly.  That  is,  for  all  x,  if  it  is  possible  (with  respect  to 
the  theory  that  Tweety  and  Chilly-Willy  are  birds, 
and  Chilly- Willy  does  not  fly)  for  x  to  fly,  then  x 
does  fly: 

K=((bird  cw)A(bird  tw)A(-,(fly  cw)) 
A(Vx((bird  x) 

A«bird  cw)A(bird  tw)A(-i(fly  cw))>(fly  x)) 

-Kflyx)))) 

The  possibility  may  be  rewritten  by  the  laws  of  Z  so 
as  to  divide  the  positive  and  negative  occurrences  of 
the  predicate  fly  as  follows: 
<>((bird  cw)A(bird  tw)  AVz((z[=]x)->(fly  z)) 

AVz((z[=]cw)->(-<fly  z)))) 
Using  ZP1  with  r:=(bird  cw)A(bird  tw),  x:=z,  7c:=fly, 
oc:=  (z[=]x),  and  P:=  (z[=]cw)  gives  the  equivalent 
possibility: 
<>((bird  cw)A(bird  tw) 

a(^(3z((z[=]x)a(z[=]cw))))) 
Since  [=]  is  transitive,  symmetric  and  reflexive  the  3z 
quantifier  may  be  eliminated  giving: 

<>((bird  cw)A(bird  tw)A-!(cw[=]x)) 
The  same  process  is  now  applied  to  the  Bird 
predicate.  First  it  is  rewritten  as: 
<>Hcw[=]x)) 

AVz(((z[=]cw)v(z[=]tw))->(bird  z)) 
AVz(#f-K^(bird  cw)))) 
Using  ZP1  with  T:  =-n(cw[=]x),  x:=z,  7t:=bird, 
a:=(z[=]cw)v(z[=]tw),  and  p:=#f  gives  the  equivalent 
possibility: 

<>(-,(cw[=]x))A(^(3z(((z[=]cw)v(z[=]tw))A#f))) 
which  by  classical  logic  is  just:  <>(— i(cw[=]x))  which 
by  Z  is  just:  (->(cw[=]x)).  Therefore,  the  original 
expression  then  becomes: 
Ks((bird  cw)A(bird  tw)A(-,(fly  cw)) 

A(Vx(((bird  xK-(cw[=]x))-Kfly  x)))) 
which  is  equivalent  in  Z  to: 
K=((bird  tw)A(bird  cw) 

A(Vx((bird  x)->((fly  x)<->-.(x[=]cw))))) 
The  sentence:  Vx((bird  x)->((fly  x)^->  ->(x[=]cw))) 
which  contains  a  necessary  equality  entails  the  non- 
modal  sentence:  Vx((bird  x)— >((fly  x)«-1(x=cw))). 

ZP1  implies  that  contingent  definitions  added  to  a 
theory  are  conservative  extensions: 

Theorem  ZP2:  Definability  of  a  predicate:  If  T,a  are 
sentences  of  Z  containing  no  unmodalized  occurrence 
of  a  predicate  it,  =,  or  propositional  variable  then: 

(<>(rA(Vx((rr  x)m))))f)(or) 
proof:  Let  P  be  -ia  in  ZP1  and  simplify.  QED. 

4.  Reflective  Reasoning  in  Z 

One  problem  with  computing  defaults  with  respect  to 
a  theory  T  is  that  the  default  itself  is  not  part  of  the 
theory  T.  When  T  essentially  contains  a  disjunction 
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this  often  causes  the  resulting  theory  to  be 
inconsistent.  For  example,  if  r  were  the  statement 
that  either  Tweety  does  not  fly  or  Chilly-Willy  does 
not  fly  then  the  proposition  that  Chilly-Willy  does 
not  fly  is  possible  with  T  and  likewise  the  proposition 
that  Tweety  does  not  fly  is  also  possible  with  T. 
Thus  both  propositions  would  be  true  by  default. 
Unfortunately,  their  conjunction  contradicts  T  if  it 
were  also  asserted.  Thus,  the  following  is  #f: 
((bird  cw)A(bird  tw)A((-i(fly  cw))v(-.(fly  tw))) 
A(Vx((bird  x)A(<(bird  cw)A(bird  tw) 

A((-,(fly  cw))v(-,(fly  tw)))>(fly  x)) 

-Kflyx)))) 

For  this  reason,  the  expression  is  reformulated  so  that 
the  default  itself  becomes  a  proposition  within  the 
possibility  expression.  This  is  done  by  writing  a 
reflective  (i.e.  fixed  point)  equation  in  K  where  the 
variable  K  is  made  synonymous  to  the  theory  and  the 
default  and  where  K  replaces  T  in  the  default  itself: 
K=((bird  tw)A(bird  cw)A((-i(fly  cw))v(-.(fly  tw))) 

AVx(((bird  x)A(<K>(fly  x)))-Kfly  x))) 
A  reflective  equation  has  the  form  K=(0  K)  where  K 
is  a  propositional  constant  (i.e.  a  globally  scoped 
universally  quantified  variable  such  as  a  Skolem 
function).  The  goal  is  to  transform  the  initial 
equation  into  a  disjunction,  ((K=  Pi)v...v(K=  pn)) 
with  each  pi  free  of  K.  If  K=  Pi  implies  the  original 
equation  then  Pi  is  a  solution  to  the  original  equation. 
We  call  the  process  of  finding  solutions  to  a  reflective 
equation  reflective  reasoning. 

Example:  A  Reflective  Equation:  Tweety  and  Chilly- 
Willy  are  birds.  At  least  one  of  them  does  not  fly. 
By  default  Birds  Fly. 

(K=((bird  tw)A(bird  cw)A(-,fly  tw)v(-,fly  cw)) 

A(Vx(bird  x)A(<K>(fly  x))-»(fly  x))) )) 
where  bird  and  fly  are  lary  predicates.  To  solve  this 
equation  we  divide  the  default  into  three  instances: 
(Ks((bird  tw)A(bird  cw)A(-ifly  tw)v(— ,fly  cw)) 
A(<K>(fly  tw))-Kfly  tw))) )) 
A(<K>(fly  cw))-»(fly  cw))) )) 
A(Vx(birdx)A(-,((x[=]tw)v(x[=]cw))) 

A(<K>(fly  x))  -K^fly  x))) )) 
Splitting  into  4  cases  we  then  get: 
((<K>(fly  tw))  a  (<K>(fly  cw)) 
A(K=((bird  tw)A(bird  cw)A((-,fly  tw)v(fly  cw) 
A(fly  tw)A(fly  cw) 

A(Vx((bird  x)A(-.((x[=]tw)v(x[=]cw))) 
A(<K>(fly  x))  H(flyx))) 
v((<K>(fly  tw))  A(^<K>(fly  cw))) 
A(K=((bird  tw)A(bird  cw)a((— ,fly  tw)v(-ifly  cw)) 
A(fly  tw) 

A(Vx(birdx)A(-.((x[=]tw)v(x[=]cw))) 
A(<K>(flyx))  ^(flyx))))) 
v((-,<K>(fly  tw))A  (<K>(fly  cw)) 
A(Ks((bird  tw)A(bird  cw)A((-ifly  tw)v(-ifly  cw) 
A(fly  cw) 

A(Vx(bird  x)  a(-,((x [=] tw) v(x[=]cw))) 


A(<K>(flyx))  -Kflyx))))) 
v((-,<K>(fly  tw))A(-,<K>(fly  cw))) 
A(Ks((bird  tw)A(bird  cw)A((-,fly  tw)v(-,fly  cw)) 
A(Vx(bird  x)A(-.((x[=]tw)v(x[=]cw))) 
A(<K>(flyx))  -»(flyx)))))) 

which  simplifies  to: 

( (<K>(fly  tw))  a  (<K>(fly  cw))a(Ks#0)  easel 
v((<K>(fly  tw))  A(-n<K>(fly  cw)))  case2 
A(K=((bird  tw)A(bird  cw)A(-ifly  cw)A(fly  tw) 
A(Vx(birdx)A(-.((x[=]tw)v(x[=]cw))) 
A(<K>(flyx))->(flyx))))) 
v((-,<K>(fly  tw))A  (<K>(flycw))  case  3 

A(K=((bird  tw)A(bird  cw)A(-.fly  tw)A(fly  cw) 
A(Vx(birdx)A(-i((x[=]tw)v(x[=]cw))) 
A(<K>(fly  x))  -Kfly  x))))) 
v((-i<K>(fly  tw))A(-i<K>(fly  cw)))  case  4 

A(Ks((bird  tw)A(bird  cw)A((-1fly  tw)v(-ifly  cw)) 
A(Vx(birdx)A(-n((x[=]tw)v(x[=]cw))) 
A(<K>(fly  x))  ->(fly  x)))))) 
In  case  1,  since  K  is  #f,  which  cannot  be  possible, 
that  case  is  eliminated.    In  case  4,  the  negated 
possibilities  say  that  [K](-,fly  tw)A[K](-ifly  cw),  but 
since  the  equation  implies  that: 
[((bird  tw)A(bird  cw)A((-ifly  tw)v(-ifly  cw)) 
A(Vx(birdx)A(-.((x[=]tw)v(x[=]cw))) 

A(<K>(fly  x))  ^(fly  x))))))]K 
it  could  only  hold  if: 

[((bird  tw)A(bird  cw)A((-,fly  tw)v(-,fly  cw)) 
A(Vx((birdx)A(-.((x[=]tw)v(x[=]cw))) 
A(<K>(flyx))  -»(fly  x))))] 
((-ifly  tw)A(-,fly  cw)) 
but  the  negation  of  this  is 
o((bird  tw)A(bird  cw)A((-.fly  tw)v(-ifly  cw)) 
A(Vx((birdx)A(-.((x[=]tw)v(x[=]cw))) 
A(<K>(fly  x))  ->(fly  x)))))) 
A((fly  tw)v(fly  cw)) 
which  is  implied  by: 
<>((bird  tw)A(bird  cw)A((-ifly  tw)v(-.fly  cw)) 
A(Vx((bird  x)A(-,((x[=]tw)v(x[=]cw)))-*(fly  x)))) 
A((fly  tw)v(fly  cw)) 
which  is  true  by  ZP1.  So  this  case  is  also  false  and 
may  be  eliminated.  For  the  remaining  two  cases  the 
inequalities  are  repeated  inside  the  possibilities: 
(((<K>(fly  tw))A(-,<K>(fly  cw)))  case2 
A(K=((bird  tw)A(bird  cw)A(-.fly  cw)A(fly  tw) 
A(Vx((birdx)A-,((x[=]tw)v(x[=]cw)) 

A(<K>((-,((x[=]tw)v(x[=]cw)))->(fly  x))) 

->(flyx))))) 

v((^<K>(fly  tw))A(<K>(fly  cw))  case  3 

A(Ks((bird  tw)A(bird  cw)A(-nfly  tw)A(fly  cw) 
A(Vx(birdx)A(-i((x[=]tw)v(x[=]cw))) 

A(<K>((-n((x[=]tw)v(x[=]cw)))->(fly  x))) 

-Kflyx))))) 

Since  the  parts  inside  the  <>  of  the  following  two 
sentences  entail  K  for  cases  2  and  3  respectively,  and 
since  they  are  #t  by  ZP1: 
<(bird  tw)A(bird  cw)A(-,fly  cw)A(fly  tw)       for  2 
A(Vx((bird  x)A(-,((x[=]tw)v(x[=]cw)))-Kfly  x)))> 
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A((-.((x[=]tw)v(x[=]cw)))-Kfly  x))) 
<(bird  tw)A(bird  cw)A(-.fly  tw)A(fly  cw)        for  3 

A(Vx((bird  x)A(-,((x[=]tw)v(x[=]cw)))-Kfly  x)))> 
A((-.((x[=]tw)v(x[=]cw)))-Kfly  x))) 
the  above  cases  2  and  3  simplify  to  just: 
(((<K>(fly  tw))A(-.<K>(fly  cw)))  case2 
A(K=((bird  tw)A(bird  cw)A(-,fly  cw)A(fly  tw) 
A(Vx((birdx)A-.((x[=]tw)v(x[=]cw))A#tA 
^(fly  x))))) 

v((-,<K>(fly  tw))A(<K>(fly  cw))  case  3 

A(K=((bird  tw)A(bird  cw)A(-ifly  tw)A(fly  cw) 
A(Vx(birdx)AK(x[=]tw)v(x[=]cw)))A#tA 
-Kfly  x))))) 

Substituting  the  solution  for  K  in  each  case  into  the 
remaining  possibility  statements  and  using  ZP1  gives: 
((#tA#tA(K=((bird  tw)A(bird  cw)A(-,fly  cw)A(fly  tw) 

A(Vx((bird  x)A-.((x[=]tw)v(x[=]cw))A-<fly  x))))) 
v(#tA#tA(K=((bird  tw)A(bird  cw)A(-ifly  tw)A(fly  cw) 

A(Vx(bird  x)A(-,((x[=]tw)v(x[=]cw)))A^(fly  x))))) 
By  the  laws  of  classical  logic  this  simplifies  to: 
((Ks((bird  tw)A(bird  cw) 

A(Vx((bird  x)->((fly  x)h->(x[=]cw)))) 
v(K=((bird  tw)A(bird  cw) 

A(Vx((bird  xH(fly  x)<-Mx[=]tw))))) 
which  results  in  a  disjunction  of  two  solutions;  one 
where  cw  is  the  only  bird  which  does  not  fly  and  one 
where  tw  is  the  only  bird  which  does  not  fly. 

5.  Propositional  Quantifier  Idiom 

When  a  reflective  equation  has  more  than  one 
solution  there  arises  the  question  as  to  which  solution 
is  to  be  used.  One  way  of  avoiding  this  question  is  to 
adopt  the  conservative  approach  of  only  accepting 
what  is  common  to  all  the  solutions.  This  is  achieved 
with  the  idiom:  K=3k(kA(ksa))  which  may  be  read 
as  the  database  K  is  the  (possibly  uncountably 
infinite)  disjunction  of  all  the  solutions  k  to  the 
reflective  equation  k=a.  When  there  are  a  finite 
number  of  solutions  Pi-.-Pn  to  such  an  equation: 
(k=Pi)v...v  (k=pn),  the  expression  3k(kA(k=a))  is 
equivalent  to  3k(kA((k=Pi)v...v(kspn)))  which  by 
the  distribution  properties  of  a  and  v  and  by  the  fact 
that  3  associates  through  v  gives  the  equivalent 
expression:  (3k(kA(ksp1)))v...v(3k(kA(kspn))). 
Since  3k(kA(k=P}))  is  logically  equivalent  to  just  Pi 
K  will  be  equivalent  to  the  disjunction  of  solutions: 
Piv...vpn.  For  example,  the  disjunction  of  solutions 
from  section  4  gives: 

K=((bird  tw)A(bird  cw)A(-ifly  cw)*->(fly  tw) 
A(Vx(((bird  x)A-n((x[=]tw)v(x[=]cw)))->(fly  x))) 

6.  Conclusion 

Nonmonotonic  reasoning  captures  a  deep  and 
important  aspect  of  human  reasoning.  Essentially  it 
allows  us  to  express  laws  and  to  reason  without 
cluttering  up  our  laws  with  numerous  exceptions. 
The  Modal  Quantificational  Logic  Z  provides  an 


effective  (i.e.  recursively  enumerable)  method  for 
automatically  deducing  nonmonotonic  consequences 
for  many  interesting  problems.  Z  improves  upon 
previous  methods  of  defining  nonmonotonic 
reasoning  via  complicated  metatheories  by  allowing 
for  quantification  through  modal  scopes  and  by 
eliminating  those  theories'  inherent  ineffective 
processes  for  computing  consistency.  In  a  broader 
sense,  Z  supports  Lewis's  [Lewis  36]  original  claim 
that  modal  logic  is  a  missing  part  of  classical  logic  in 
that  this  modal  logic  also  accounts  for  nonmonotonic 
reasoning. 
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Induction,  which  has  been  called  the  "scandal  of  philosophy",  has  become  the 
scandal  of  psychology  and  artificial  intelligence  as  well. 

J.  H.  Holland  et  al.,  Induction  (1986) 


ABSTRACT  —  In  light  of  the  ongoing  discussion  about 
the  name  of  the  emerging  field,  I  propose  and  partly  moti- 
vate the  name  Inductive  Semiotic  Systems  as  an  appropri- 
ate one.  The  three  postulates  given  in  section  5  capture  the 
essence  of  the  justification  for  the  proposed  name.  They  can 
be  further  reduced  to  the  following.  Since  all  objects,  in  the 
universe  have  emergent  compositional  structure,  the  term 
"object  structure"  (and  therefore  the  term  "meaning" )  can- 
not be  properly  understood,  defined,  and  captured  outside 
the  inductive  learning  process.  Such  a  process,  based  on  the 
appropriate  formal  mathematical  structure,  try  to  capture 
the  object-class  structure  by  constructing  the  "inductive 
sign"  that  represents  in  the  given  context  both  the  object 
and  the  corresponding  class  of  objects.  Moreover,  I  conjec- 
ture that  all  other  representations  and  semiotic  processes 
have  evolved  around  these,  inductive,  representations.  The 
main  link  with  the  classical  sciences  is  outlined:  the  work- 
ing hypothesis  (which  we  currently  pursue)  that  the  proper 
sensors  for  all  intelligent  systems  are  symbolic,  or  inductive, 
measurement  devices,  a  far-reaching  generalization  of  the 
classical,  or  numeric,  measurement  devices.  In  this  connec- 
tion, I  briefly  review  the  inductive  nature  of  the  cognitive 
processes  responsible  for  the  emergence  of  numbers. 

Keywords:  intelligent  systems,  symbolic  representations, 
numbers,  inductive  learning,  inductive  class  representation, 
evolving  transformation  system,  symbolic  (inductive)  mea- 
surement devices. 

1.  INTRODUCTION 

This  paper  was  motivated  by  a  number  of  developments 
all  originating  from  my  work  on  the  foundation  of  pat- 
tern recognition  began  20  years  ago  and  intended  to 

This  work  is  partially  supported  by  the  NSERC  research 
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unify  the  two  basic  but  incompatible  approaches:  vec- 
tor space,  or  the  classical  statistical,  approach,  and  for- 
mal grammar,  or  the  syntactic,  approach.  The  work 
gradually  culminated  in  a  fundamentally  new  mathe- 
matical model,  Evolving  Transformation  System  (ETS), 
that  attempts  to  capture  formally  the  nature  of  "sym- 
bolic" representations  [1,  2,  3,  4,  5].  The  model  eventu- 
ally led  to  the  outline  of  the  inductive  theory  of  vision 
[6],  which,  in  turn,  prompted  the  re-examination  and 
generalization  of  the  classical,  or  numeric,  measure- 
ment processes  [7].  What,  basically,  motivated  a  se- 
rious re-examination  of  the  classical  measurement  pro- 
cesses is  not,  as  one  might  expect,  the  well-documented 
problematic  state  of  the  quantum  measurement  prob- 
lem, but  the  realization  that  ETS  embodies  a  funda- 
mental generalization  of  the  classical  mathematical  con- 
cept of  (measurement)  space  [8,  9]  such  that  the  former 
cannot  be  reduced  to  the  latter.  In  other  words,  it  be- 
came clear  that  most  processes  in  a  symbolic  represen- 
tation space  are  not  reducible  to  those  in  a  numeric 
space,  the  fact  whose  importance  cannot  be  overes- 
timated. In  particular,  ETS  suggested  that  the  cur- 
rently accepted  understanding  of  biological  information 
processing  (inevitably  dictated  by  our  current  state  of 
measurement  and  instrumentation)  as  proceeding  from 
numeric  to  symbolic  representations  is  not  correct:  all 
stage  of  biological  information  processing,  including  the 
initial  stage  of  transduction,  can  properly  be  under- 
stood using  the  appropriate  symbolic  representations 
only  [10]. 

Thus,  with  the  understanding  of  fundamental  irre- 
ducibility  of  the  symbolic,  or  semiotic,  processes  to  the 
numeric  ones,  the  question  of  the  structure  of  intelligent 
semiotic  systems  could  be  addressed  in  a  more  appro- 
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priate  and  satisfactory  manner,  i.e.  the  basic  formal 
language  should  be  that  of  a  formally  specified  "sym- 
bolic" mathematical  model,  or  ETS.  Since  the  latter 
was  developed  to  capture  the  nature  of  inductive  learn- 
ing processes,  it  is  quite  natural  to  call  the  emerging 
theory  the  inductive  theory  of  semiotic  systems.  More- 
over, the  name  is  also  (and  mainly)  justified  on  the 
basis  of  the  fact  that,  according  to  the  ETS,  the  mod- 
ification of  "geometry"  of  the  symbolic  representation 
space,  and,  as  it  turns  out,  of  the  very  representation 
itself,  is  actually  accomplished  during  inductive  learn- 
ing processes,  which  thus  emerges  as  the  central  intel- 
ligent processes.  It  appears  also  that  the  concept  of 
emergent  representation  cannot  be  adequately  under- 
standing outside  inductive  learning  processes. 

The  exposition  in  the  paper  is  as  informal  as  pos- 
sible in  order  to  be  accessible  to  a  wider  range  of  re- 
searchers, and  therefore  no  technical  details  of  the  ETS 
model,  which  led  to  all  the  considerations,  are  given. 

2.  THE  CENTRAL  ROLE  OF  INDUCTIVE 
LEARNING  PROCESSES 

The  pre-eminent  role  of  inductive  processes  in  episte- 
mology  and  science  has  been  recognized  almost  from 
the  very  beginning  and  increasingly  so  as  science  ma- 
tured. Thus,  passing  Bacon,  Hume,  Mill  and  confining 
ourselves  to  the  latest  period,  this  century,  we  find,  for 
example,  B.  Russell  devoting  to  induction  a  chapter  in 
his  book  "Problems  of  Philosophy"  (1912).  In  "Outline 
of  Philosophy"  (1927)  he  states: 

Induction  raises  perhaps  the  most  diffi- 
cult problem  in  the  whole  theory  of  knowl- 
edge. Every  scientific  law  is  established  by 
its  means,  and  yet  it  is  difficult  to  see  why 
we  should  believe  it  to  be  a  valid  logical 
process.  . . .  When  mankind  took  to  sci- 
ence, they  tried  to  formulate  logical  princi- 
ples justifying  this  kind  of  inference.  I  will 
only  say  that  they  seem  to  me  very  unsuc- 
cessful. I  am  convinced  that  induction  must 
have  validity  of  some  kind  in  some  degree, 
but  the  problem  of  showing  how  or  why  it 
can  be  valid  remains  unsolved. 

Another  leading  philosopher,  A.N.  Whitehead,  in  spite 
of  the  basic  philosophical  differences  with  his  impor- 
tant collaborator  B.  Russell,  in  "Science  and  the  Mod- 
ern World"  (1925)  also  agrees  with  the  latter  on  the 
important  role  of  induction:  "The  theory  of  induction 
is  the  despair  of  philosophy — and  yet  all  our  activities 
are  based  upon  it" . 

Before  proceeding  further,  a  "fresh"  model-independent 
definition  of  inductive  learning  process  proposed  by  me 


might  be  useful. 

Definition:  Given  a  small  finite  set  C+  of  positive 
objects  that  are  randomly  chosen  from  a  (possibly  in- 
finite) class  C — class  to  be  learned — and  a  small  finite 
set  C~  of  negative  objects,  i.e.  not  from  C,  the  induc- 
tive learning  process  has  to  construct  an  inductive  class 
representation  (ICR)  of  C  such  that 

•  ICR  can  generate  an  approximation  Capp  of  C, 
and 

•  if  C*pp  is  a  noisy  perturbation  of  Capp  and  C* 
is  a  noisy  perturbation  of  C,  then  the  ratio  of 
cardinalities 


\C*\ 

is  close  to  1. 

Note  both  the  generative  (and,  therefore,  mainly  dis- 
crete) nature  of  the  ICR,  as  well  as  a  fuzzy  boundary 
of  the  "generated"  class  (and,  therefore,  the  presence 
of  continuous  element  in  the  ICR,  which  is  not  obvious 
from  the  definition).  These  two  features,  combined  in  a 
natural  way,  make  the  above  definition  unique  among 
the  other  definitions  of  inductive  learning  processes. 
Furthermore,  both  of  them  are  supported  by  psycho- 
logical experiments  ([11],  pp.  33-38;  [12],  pp.  97-102). 
The  uniqueness  of  the  definition  is  further  clarified  by 
our  recent  experiments,  which  strongly  suggests  that 
the  well  known  "inductive  learning"  models,  including 
connectionist  models,  are,  in  fact,  not  inductive  learn- 
ing models  at  all. 

What  does  make  the  inductive  learning  process  a 
prime  candidate  for  the  central  cognitive  process?  The 
answer,  I  believe,  is  strongly  suggested  by  the  ubiqui- 
tous fact  that  any  object/event  representation  is  guided, 
to  a  considerable  extent,  by  the  finite  number  of  the 
agent's  encounters,  direct  and  indirect,  with  that  ob- 
ject/event. In  fact,  it  is  not  difficult  to  see  how  the 
construction  of  practically  all  cherished  concepts  in  cog- 
nitive psychology,  including  frames  and  mental  mod- 
els, can  be  accomplished  based  on  the  concept  of  ICR. 
Moreover,  H.  Plotkin  in  [13]  makes  a  book-length  ar- 
gument that  only  inductive  mechanisms  can  allow  each 
biological  species  to  convert  the  "Humean  uncertainty 
. . .  into  a  pragmatic  issue  of  dividing  the  world  into 
band  width  of  frequencies  of  change  and  fluctuation, 
and  [to  employ  these]  . . .  mechanisms  that  are  able  to 
match  the  rates  of  perturbation  of  the  world  with  or- 
ganic structures  ...  to  alter  their  own  states  at  equiv- 
alent rates"  (see  also  [14]). 

3.  INDUCTIVE  ORIGIN  OF  NUMBERS 

Remarkably,  it  appears  that  the  decisive  and  irreplace- 
able role  of  the  inductive  axiom  in  the  well  known 
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Peano  axiomatization  of  natural  numbers  is  a  reflec- 
tion of  the  inductive  cognitive  origin  of  numbers. 

Following  [15],  sect.  1.2,  I  first  summarize  the  4 
stages  in  the  emergence  of  numbers.  The  presence 
of  the  first  stage  (which  has  evolved  in  many  animals 
many  millions  of  years  ago)  is  indicated  by  the  abil- 
ity to  reasonably  accurately  compare  the  sizes  of  some 
temporal  and  spatial  sets  of  events  or  objects  (see,  for 
example  [16,  17,  18,  19]). 

The  second  stage  is  manifested  in  the  primitive  cul- 
tures in  the  form  of  the  choice  of  very  few  selected 
reference,  or  standard,  sets  of  objects  mainly  for  the 
purpose  of  explicit  storage  of  various  set  sizes  (see,  for 
example  [16,  17,  18]). 

The  third  stage  is  achieved  when  a  single  reference 
set  of  object  is  selected,  and  the  fourth  stage  is  char- 
acterized by  the  emergence  of  the  abstract  concept  of 
number,  independent  of  any  particular  set  of  reference 
objects.  One  should  note  that  the  last  stage  is  quite 
recent  (4000-2000  B.C.)  and  it  is  reflected  in  many  of 
our  measures  of  length:  the  height  of  a  horse  is  mea- 
sured in  "hands"  and  length  generally  in  "feet"  (from 
foot)  and  earlier  also  in  "ells"  (from  elbow,  1  ell  =  45 
inches).  Also,  the  word  "digit"  means  not  only  the 
numbers  1,  2,  3,  ...  but  a  finger  or  a  toe  as  well.  Simi- 
lar phenomenon  is  observed  in  practically  all  languages 
[16,  17,  18]. 

One  point  to  observe  in  the  progression  of  the  four 
stages — and  this  has  actually  became  perfectly  clear 
only  at  the  end  of  the  last  century — is  that  a  num- 
ber is,  in  fact,  a  sign  representing  the  (infinite)  class  of 
all  finite  sets  whose  sizes  are  all  equal  to  this  number. 
This,  plus  the  other  point,  which  will  be  discussed  next, 
lead  quite  naturally,  as  we  shall  see,  to  the  hypothesis 
proposed  in  the  title  of  this  section  that  the  concept  of 
number  has  been  acquired  by  means  of  the  evolutionary 
inductive  learning  processes,  when  the  evolving  agent 
gradually  acquires  the  concept  of  number  via  embed- 
ded in  the  agent  (and  therefore  also  evolving)  cognitive 
inductive  learning  processes  interacting  with  the  envi- 
ronment. 

Let  us  next  look  at  this  process  from  a  little  more 
formal  perspective.  The  definition  of  inductive  learning 
process  given  in  section  2  is  model-independent,  i.e.  no 
concrete  mathematical  model  is  specified  to  account, 
first,  for  the  generativity  and,  second,  for  the  measure 
of  proximity  of  elements  in  the  set.  Thus,  to  continue, 
we  shall  need  to  choose  some  concrete  mathematical 
model  satisfying  the  two  general  and  absolutely  essen- 
tial conditions  of  the  definition.  Since  at  present  no 
model  except  ETS  satisfies  the  requirements  of  the  def- 
initions, the  latter  is  quite  naturally  my  choice.  Hence, 
in  what  follows  I  simply  adopt  the  ETS  as  the  model  of 
inductive  learning  (see,  for  example,  my  paper  in  the 


last  year  proceedings  of  this  conference  [5]  or  [2,  3,  4]). 

In  a  few  words,  ETS  can  be  described  as  a  formal 
mathematical  explication  of  the  idea  of  the  "symbolic" 
object  space  as  opposed  to  the  idea  of  "numeric"  ob- 
ject space.  The  latter  is  captured,  for  instance,  by  con- 
cept of  Euclidean  vector  space.  It  turns  out  that  there 
is  a  profound  difference  in  the  nature  of  the  (metric) 
geometry  for  the  two  classes  of  spaces,  numeric  object 
spaces  forming  a  very  special,  "degenerate" ,  subclass  of 
the  class  of  symbolic  object  spaces.  (The  distance  be- 
tween two  symbolic  objects — or,  more  precisely  ,  object 
representations — is  defined  as  the  "shortest"  sequence 
of  weighted  symbolic  transformations  connecting  the 
two  objects,  e.g.  deletions,  insertions  and  substitu- 
tions.) The  important  new  idea  here  is  that  the  dis- 
tance function  is  defined  by  the  fixed  set  of  weighted 
symbolic  operations,  each  of  which  can  transform  one 
object  into  the  other.  In  other  words,  the  new  idea  is 
that  the  concept  of  distance  is  introduced  via  the  fixed 
set  of  symbolic  operations,  which  in  a  numeric  case,  e.g. 
vector  space,  becomes  "trivial".  Moreover,  the  repre- 
sentation itself  is  not  fully  determined  by  the  "struc- 
ture" of  the  object  itself  but  also  by  the  latter  set  of 
operations,  specifying  the  emergent  multi-resolutional 
structure  that  can  only  be  discovered  during  various 
inductive  learning  processes.  For  example,  in  case  of 
strings,  to  indicate  such  an  emergent  structure,  the 
usual  string  representation  aacdbaacdcabba  is  not 
sufficient,  and  more  adequate  form  of  representation 
might  be,  for  example,  a.acd.ba.acd.ca.b.ba  reflect- 
ing the  corresponding  symbolic  operations  that  are  con- 
structed during  the  learning  stage  from  the  basic  single- 
letter  operations  with  the  help  of  several  fixed  compo- 
sition forming  operators  [3]. 

In  ETS,  the  inductive  class  representation  (ICR), 
constructed  during  the  learning  process,  consists  of  a 
small  selected  subset  of  the  positive  training  set  plus 
the  learned  set  of  weighted  operations  (that  can  act 
on  the  selected  training  set  to  generate  the  approxima- 
tion Capp  of  the  class  C  to  be  learned  (see  Definition  in 
section  2).  The  very  simple  nature  of  numeric  object 
representation  space  manifests  itself  in  the  simplicity  of 
the  ICR  for  the  set  N  of  natural  numbers:  1  &  the  suc- 
cessor operation.  The  latter  can  be  seen,  for  example, 
from  the  Peano  axiomatization  of  N,  which  is  now  uni- 
versally accepted  as  the  standard  one.  This  symbolic, 
or  compositional,  simplicity  becomes  perhaps  more  ap- 
parent when  we  view  natural  numbers  as  the  set  of 
strings  over  a  single  letter  alphabet  {a}: 

N  =  {a,   ao,   aaa,   aaaa,    . . .  }. 

Thus,  looking  at  N  as  the  result  of  application  of  the 
corresponding  ICR  (e.g.  string  aaa  plus  the  deletion- 
insertion  of  a),  it  is  not  difficult  at  all  to  connect  this 
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ICR  with  the  actual  learning  process  as  reflected  in  the 
above  four  stages:  the  positive  training  set  may  consist 
of  several  strings  of  a's,  for  example,  a,  aaa,  aaaaaa, 
and  the  negative  training  set  is  simply  empty.  Since 
the  final  set  of  operation  consists  of  just  one  single- 
letter  deletion-insertion  operation,  no  multiresolutional 
structure  enters  the  "picture" .  In  this  manner  a  very 
natural  connection  is  established  between  the  emer- 
gence of  natural  numbers  and  the  central  intelligent 
processes — inductive  learning  processes. 

4.  GENERALIZED  MEASUREMENT 
PROCESS  AS  INDUCTIVE  LEARNING 
PROCESS 

In  the  last  section  we  have  connected  the  structure  of 
natural  numbers — and  therefore  of  most  of  our  present- 
day  mathematics — to  the  simplest  form  of  inductive 
class  representation  (ICR),  consisting  of  a  single  object 
plus  a  single  operation.  It  is  not  difficult  to  see  now  the 
gradual  development  of  the  classical  scientific  paradigm 
as  application  of  this  (Peano)  ICR  to  the  representa- 
tion of  various  important  "numeric"  features  of  reality: 
mass,  distance,  time,  speed,  etc.  etc..  I  want  to  em- 
phasize the  fact  that  in  spite  of  apparent  multitude  of 
"variables"  introduced  in  basic  sciences,  the  structure 
of  the  representation  space  for  all  these  variables — e.g. 
Euclidean  vector  space — is  essentially  determined  (via 
the  real  numbers)  by  the  structure  of  the  ICR  of  natu- 
ral numbers. 

Given  the  above,  one  can  propose  to  view  more  gen- 
eral, i.e.  "symbolic,  or  semiotic",  measurement  pro- 
cesses simply  as  various  concrete  implementations  of 
general  inductive  learning  processes.  (We  were  lead  to 
these  general  measurement  processes  [7]  after  the  in- 
vestigation of  the  foundations  of  computational  vision 
processes  [6].) 

To  better  understand  the  meaning  of  "symbolic  mea- 
surement process",  let  us  first  look  from  the  new  per- 
spective at  a  concrete  example  of  the  classical,  or  nu- 
meric, measurement  process — length  measurement.  It 
goes  without  saying  that  the  emergence  of  numbers  has 
preceded  the  idea  of  the  ruler.  Hence,  the  idea  of  length 
measurement  should  be  looked  at  as  a  quite  natural 
application  of  the  above  view  of  natural  numbers  (via 
Peano  ICR)  to  the  concept  of  length,  i.e.  the  above  a 
becomes  (linear)  unit  of  measurement. 

We  have  recently  proposed  [7]  to  generalize  the  clas- 
sical idea  of  length  measurement  to  the  idea  of  plane 
shape  "measurement" .  The  basic  idea  is  to  expand  the 
"measurement  alphabet"  from  A  =  {a}  to,  for  exam- 
ple, A'  —  {a,  b,  c},  where  the  nonlinear  units  b  and  c 
are  the  oriented  concave  and  convex  corners,  as  in  the 
example  in  [1].  In  other  words,  if  the  idea  of  the  clas- 


sical linear  ruler  is  to  assemble  successively  identical 
copies  of  a  single  linear  unit  of  measurement,  the  basic 
idea  of  the  above  "plane  ruler"  is  to  allow  one  to  as- 
semble dynamically  several,  including  nonlinear,  units 
of  measurement  and  to  apply  the  inductive  learning 
process  to  discover  the  corresponding  symbolic  opera- 
tions that  specify  the  multiresolutional  structure  of  the 
shape  representation  as  a  member  of  a  certain  class  of 
shapes.  For  a  more  detailed  exposition  of  this  exam- 
ple see  the  companion  paper  [7],  where,  in  particular, 
it  is  illustrated  how  the  noise  in  the  symbolic  repre- 
sentation can  also  be  eliminated  during  the  inductive 
learning  process  by  discovering  the  corresponding  sym- 
bolic operation  of  weight  zero. 

5.  INDUCTIVE  SEMIOTIC  SYSTEMS 

In  the  tradition  of  these  conferences,  I  propose  to  ad- 
dress both  the  "computational  semiotics"  ([20],  p.  155) 
and  the  semiotics  understood  by  Metz  as  "the  formal- 
ization of  the  natural  sciences"  ([21],  p.  30)  in  one 
framework — Inductive  Semiotic  Systems  (ISS).  The  cor- 
responding new  science  can  then  be  defined  as  the  sci- 
ence concerned  with  the  study  and  modeling  of  au- 
tonomous (including  biological)  systems  that  can  sym- 
bolically encode  a  functionally  necessary  (but  possibly 
unbounded)  class  of  events/objects  in  the  environment 
after  an  exposure  to  a  very  small  number  of  events  from 
the  class. 

As  was  partly  outlined  in  the  last  section,  the  ques- 
tions related  to  "the  problem  of  how  signals  from  sen- 
sors can  be  reliably  transformed  into  symbolic  data 
structures  that  are  suitable  for  logical  reasoning"  [22] 
are  also  addressed  by  the  proposed  framework  but  in  a 
manner  much  more  radical  than  implied  by  that  state- 
ment: the  sensors  themselves  must  embody  inductive 
(symbolic)  measurement  devices.  Such  intelligent  devices 
contrary  to  the  classical  sensors  based  on  the  very  sim- 
ple form  of  the  ICR — would  be  modeled  in  accordance 
with  the  new  mathematical  structure,  ETS,  that  al- 
lows the  measurement  device  to  capture  in  a  much  less 
restrictive,  symbolic,  form  the  structure  of  the  corre- 
sponding class  of  events  in  the  environment.  The  out- 
puts of  such  measurement  devices,  as  in  the  case  of 
biological  systems,  would  be  the  current  multiresolu- 
tional symbolic  representations  of  events,  or  objects. 
Hence  no  transduction  of  the  numeric  input  into  sym- 
bolic representation  is  necessary.  Moreover,  as  became 
gradually  clear  to  us,  such  a  transduction  appears  to  be 
meaningless,  since,  as  was  mentioned  above,  the  class 
of  numeric  spaces  forms  a  degenerate  subclass  of  the 
class  of  symbolic  spaces,  so  that  in  this  case  the  very 
rational  for  symbolic  representation  collapses. 

Semiotics  is  often  defined  as  a  field  "involved  in 
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analyses  of  trilateral  unity:  object-sign-meaning"  ([23], 
p.  45).  The  latter  view  goes  back  to  C.  Peirce,  who 
introduced  semiotics  as  "the  doctrine  of  the  essential 
nature  and  fundamental  varieties  of  possible  semiosis"  , 
where  "by  the  semiotics  I  [he]  mean[s]  an  action,  an 
influence,  which  is,  or  involves,  a  cooperation  of  three 
subjects,  such  as  a  sign,  its  object  and  its  interpretant, 
this  tri-relative  influence  not  being  in  anyway  resolv- 
able into  actions  between  pairs"  ([21],  p.  15). 

In  accordance  with  the  above  (ISS)  view  of  semi- 
otics, one  can  introduce  a  somewhat  more  precise  per- 
spective on  the  "analysis  of  the  trilateral  unity" ,  as  the 
the  study  and  modeling  of  the  following  triad:  com- 
binative, or  compositional,  object  structure,  the  cor- 
responding sign  (i.e.  object  representation)  structure, 
and  the  mathematical  structure  that  captures  the  for- 
mer. And  it  is  the  inductive  learning  processes,  through 
the  discovery  of  the  object-class  structure,  that  relate 
the  members  of  the  triad.  This  interpretation,  as  well 
as  the  following  postulates,  are  adopted  from  the  sim- 
ilar view  of  the  computational  vision  given  in  [6].  The 
following  three  postulates  may  be  viewed  as  a  more  de- 
tailed description  of  the  relation  between  the  members 
of  the  triad. 

Postulate  1.  All  objects  in  the  universe  have  emer- 
gent compositional  structure.  Moreover,  the  term  "ob- 
ject structure" ,  and  therefore  the  term  "meaning" ,  can- 
not be  properly  understood  and  defined  outside  the  in- 
ductive learning  process. 

Postulate  2.  The  inductive  learning  process  is 
an  evolving  process  that  tries  to  capture  the  emergent 
compositional  object-class  structure  mentioned  in  Pos- 
tulate 1.  The  mathematical  structure  on  which  the 
inductive  learning  model  is  based  should  have  the  in- 
trinsic capability  to  capture  the  evolving  compositional 
object  structure. 

(As  discussed  in  section  3,  the  appropriate  mathemat- 
ical structure  is  fundamentally  different  from  the  clas- 
sical mathematical  structures.) 

Postulate  3.  All  basic  representations,  i.e.  signs, 
are  constructed  on  the  basis  of  the  "inductive  signs" , 
which,  in  turn,  are  constructed  by  the  inductive  learn- 
ing processes  (see  Postulate  2).  Thus,  the  inductive 
learning  processes  form  the  core  around  which  all  other 
semiotic  processes  have  evolved. 

6.  CONCLUSION 

I  have  proposed  to  put  the  inductive  object-class  rep- 
resentation constructed  during  the  inductive  learning 
process  at  the  basis  of  the  semiotic  "analysis  of  the  tri- 
lateral unity  [of]  object-sign-meaning"  [23].  There  is 
considerable  evidence  (mainly  not  discussed  in  this  pa- 
per) to  suggest  that  the  inductive  learning  processes 


are  the  central  intelligent  processes  responsible  for  the 
construction  of  the  current  basic  object  representations 
( "signs" ) .  The  emergent  symbolic  operations  constructed 
during  learning  and  inducing  on  the  symbolic  space 
the  "optimal  (metric)  geometry"  clarify  the  meaning 
of  "meaning"  as  well  as  the  meaning  of  the  multireso- 
lutional  representation. 

As  discussed  in  section  3,  the  emergence  of  num- 
bers appears  to  be  a  result  of  application  of  the  per- 
ceptual inductive  learning  processes  to  the  representa- 
tion of  the  set  size  only,  the  latter  being  a  very  re- 
stricted, numeric,  form  of  inductive  object-class  rep- 
resentation. Moreover,  the  application  of  the  general 
inductive  learning  processes  to  a  less  trivial,  symbolic, 
form  of  object-class  representation  suggests,  in  partic- 
ular, the  future,  qualitatively  quite  different,  line  of 
development  of  mathematics,  as  was  also  suggested, 
for  example,  in  ([24],  p.  36).  Such  "symbolic"  rather 
than  "numeric"  mathematics  would  be  about  the  en- 
tities that  are  compositionally  structured  in  a  manner 
less  "trivial",  or  more  general,  than  the  "numeric"  en- 
tities. In  connection  with  this,  it  is  interesting  to  note 
the  considerable  (non- technical)  similarity  of  Fodor  and 
Pylyshyn's  important  argument  [25]  in  favor  of  sym- 
bolic representations  as  opposed  to  numeric  ones  on 
the  basis  of  thoughts  having  non-trivial  compositional 
structure  and  directed  against  the  connectionist  cogni- 
tive models. 

The  real  possibility  of  development  and  the  nature 
of  symbolic,  or  inductive,  measurement  devices  are  also 
briefly  discussed,  in  order  to  suggest  the  essential  sci- 
entific link  between  the  classical  sciences  and  that  of 
the  inductive  semiotic  systems. 

Thus,  the  present  paper  can  be  considered  as  an 
attempt  to  fundamentally  address  the  great  recurring 
"scandal"  mentioned  in  the  epigraph. 
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Abstract.  This  paper  introduces  an  approach  for 
analysis  of  systems  employing  unity  of  learning  and 
planning  processes.  An  architecture  of  an  automaton  with 
joint  learning  and  planning  (LPA)  is  presented  which 
determines  a  class  of  learning  algorithms.  An  algorithm  of 
learning  that  employs  grouping,  focusing  attention  and 
combinatorial  search  is  described.  LPA  searches  for  an 
appropriate  set  of  rules,  and  then,  for  a  preferable  motion 
trajectory.  It  creates  top  down  planning/control  processes 
which  are  based  on  multigranular  hierarchies  of  knowledge 
and  lead  to  the  temporal  evolution  of  the  system. 

Key  words:  complexity,  control,  evolution  in  biology, 
experiences,  focusing  attention,  generalization,  granularity, 
grouping,  intelligence,  multigranular  knowledge  hierarchies, 
multiresolutional  architecture,  planning. 

1.  Introduction 

In  1986  the  concept  of  "baby-robot"  was  introduced  in 
[1].  This  concept  emphasizes  processes  of  unsupervised 
learning  and  is  associated  with  the  joint  evolution  of 
behavior  and  knowledge  incorporated  within  a  system.  It 
seems  plausible  that  learning  from  multiple  experiences  is 
inseparable  from  the  architecture  of  applying  the  acquired 
knowledge  for  shaping  the  desirable  behavior. 

We  introduce  a  formal  system — a  learning  automaton 
which  has  the  faculties  required  to  obtain  and  use  knowledge: 
sensors,  subsystems  for  storing  and  organizing  information, 
subsystems  for  generating  commands,  actuators  for  changing 
the  world,  but  initially  it  has  no  world  model  and  no  rules  to 
achieve  a  goal.  This  knowledge  should  be  learned  and  our 
goal  is  to  understand  how. 

The  concepts  of  experience  as  a  part  of  behavior,  as  well 
as  desirability  of  behavior  and  its  components-experiences, 
will  be  introduced  and  explored  (Section  3).  Behavior  is 
shaped  by  the  actions  which  emerge  as  a  result  of  planning 
the  goal-states,  actions  that  lead  to  goal-states,  and  strings  of 
them.  The  process  of  finding  the  string  of  desirable  states 
and  actions  leading  to  them  is  called  planning.  The 
latter  would  be  impossible  unless  the  ability  to  judge  the 
degree  of  desirability  of  states  and  actions  would  not  be 
acquired  in  advance.  Generating  and  applying  proper 
commands  to  execute  the  planned  trajectory  and  compensate 
for  errors  is  called  execution.  Both  planning  and  execution 


are  a  part  of  control.  The  process  of  acquiring  the  relevant 
information  and  processing  it  so  that  a  proper  behavior  could 
be  generated  is  called  learning.  Planning/control  and  learning 
are  complementary  procedures  of  intelligent  computation. 

The  process  of  planning  starts  with  focusing 
attention  which  selects  the  initial  representation  of  the  world 
or  the  map  with  its  boundaries.  The  space  is  discretized  into 
tessellata  which  determine  the  space  resolution,  or 
granularity.  Combinatorial  search  is  performed  as  a  procedure 
of  choosing  one  string  (the  minimum  cost  string)  out  of  the 
multiplicity  of  all  possible  strings  formed  out  of  the  space 
tessellata  at  this  particular  level  of  resolution. 

Grouping  the  tessellata  in  a  variety  of  feasible  strings 
allows  selection  of  one  of  them.  This  determines  an 
envelope  of  focusing  attention  around  the  vicinity  of  the 
minimum  cost  string.  All  information  within  the  envelope 
is  being  transformed  to  the  higher  level  of  resolution  and 
grouping  of  behavior  units  starts  again.  Focusing  attention 
presumes  proper  distribution  of  nodes  in  the  state  space  so 
that  no  unnecessary  search  be  performed.  Combinatorial 
search  forms  the  alternatives.  All  of  these  three  procedures 
together  can  be  considered  as  a  process  of  generalization. 
Generalization  is  a  generation  of  the  representation  at  the 
lower  level  of  resolution. 


Figure  1.  Multiresolutional  Consecutive  Knowledge 
Acquisition  by  the  virtue  of  Generalization 


An  example  of  joint  functioning  of  these  three 
operations  is  shown  in  Figure  1.  The  operations  work 
jointly  as  a  triplet,  which  can  be  considered  an  elementary 
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unit  of  intelligence  [2].  The  process  is  shown  in  Figure  1 
for  learning.  It  presumes  generalization  of  high  resolution 
units  of  information  into  lower  resolution  clusters.  This 
process  can  proceed  in  the  opposite  direction,  from  a  lower 
resolution  to  a  higher  resolution,  i.  e.  counterclockwise. 
Then  it  will  demonstrate  planning  and  presume  instantiation 
via  decomposition  of  low  resolution  units  into  higher 
resolution.  Both  generalization  and  instantiation  are  the  key 
processes  of  intelligence. 

All  of  these  phenomena  presume  that  the  system  under 
consideration  can  "behave"  as  a  result  of  its  perceiving  the 
states,  planning  the  decisions  and  acting  by  using  the 
actuators.  The  overall  system  can  be  represented  using  the 
concept  of  elementary  loop  of  functioning  [2-4]. 

This  paper  demonstrates  that  learning  leads  to 
emergence  of  a  multiresolutional  representation  and  a 
hierarchy  of  planning/control. 

2.  Algorithms  of  Unsupervised  Learning 
with  Combinatorial  Enhancement  and 
Generalization 

2.1  Learning  Automata.  An  automaton  is  a  state 
machine  which  generates  outputs  by  using  its  transition 
function  and  state-output  function  tabulated  in  advance  for 
all  possible  states.  Conventional  automata  are  capable  of 
demonstrating  "reactive  behaviors"  according  to  the 
prescriptions  stored  in  their  transition  and  output  functions 
[21,  22].  In  this  paper,  a  class  of  learning  automata  is 
outlined  which  are  state  machines  whose  transition  and  state- 
output  functions  are  updated  and  modified  based  upon  prior 
experiences  which  are  stored  in  the  memory  and  transformed 
into  sets  of  rules.  The  transition  and  output  functions  of 
these  automata  have  open  lists  of  rules.  New  rules  can  be 
added  to  these  lists. 

This  concept  is  similar  to  the  one  described  in  [5].  In 
this  paper,  we  will  equip  learning  automaton  with  a  new 
capability:  to  synthesize  their  output  by  combining  together 
previously  stored  rules  in  search  of  the  most  appropriate 
behavior.  In  addition  to  reactive  responses,  our  learning 
automata  demonstrate  the  skill  of  deliberation,  or  planning. 

This  new  type  of  automata  will  be  called  Learning  and 
Planning  Automata  (LPA).  LPA  are  presumed  to  have  a 
learning  system  L  which  allows  for  enriching  both  the 
transition  and  the  output  functions  and  a  planning  system, 
as  a  part  of  the  mechanism  of  behavior  generation  BG. 

As  the  system  of  rules  develops  it  becomes  a 
hierarchical  one.  This  is  equivalent  to  formation  of  the 
hierarchy  of  automata  as  a  result  of  the  evolution  of  a  single 
learning  automaton  and  affects  the  corresponding  input  and 
output  vocabularies.  Operators  of  L  are  equipped  by 
minimum  initial  set  of  tools  including  the  ability  to  form 
strings,  to  construct  hypotheses,  and  to  infer  tautologies. 

Learning  system  L  can  be  defined  as  a  system  of 
acquisition  of  experiences,  transformation  of  these 
experiences  into  rules  of  action,  derivation  of  new  concepts, 
and  organization  of  these  concepts  into  enhanced  knowledge 
base  (entity-relational  network  similar  for  the  long  term 


knowledge  called  ontology,  base  of  hypotheses,  concept  base, 
etc.)  This  improves  decision-making  for  achieving  the  goal. 

In  order  to  support  processes  of  learning,  each  learning 
automaton  is  equipped  by  the  set  of  actuators  that  follow 
commands  at  the  output  of  learning  automaton.  Changes  in 
the  world  are  measured  by  the  sensors.  A  set  of  sensors  is  the 
only  source  of  information  for  the  learning  system  about  the 
state  of  the  World  (state.)  The  automaton  is  also  equipped  by 
subsystems  of  Sensory  Processing  and  World  Model  which 
allow  for  interpreting  the  input  from  sensors. 

The  Learning  System  can  judge  upon  truthfulness  of 
this  information  only  by  the  results  of  actions  (behavior) 
which  are  undertaken  to  achieve  the  goal.  The  subsystem  of 
Behavior  Generation  (BG)  contains  transition  function  and 
output  function  [2,  3,  6].  Other  devices  of  BG  are  described 
in  Section  5.  This  approach  is  presented  earlier  in  [7,  8]. 

The  state  of  the  world  is  understood  as  a  set  of  n 
sensors'  outputs  which  arrive  at  a  particular  moment  of  time. 
Action  is  a  set  of  m  action  outputs  which  are  generated  by 
the  system  between  two  consecutive  states.  States  can  be 
represented  as  sets  (lists),  or  as  vectors  within  a  particular 
system  of  coordinates.  Actions  are  understood  as  causes  of 
changes  that  are  sensed  after  the  actions  are  applied. 

Goal  is  a  state  which  must  be  achieved  as  a  result  of  the 
behavior.  The  goal  is  often  presumed  to  be  given  to  a 
system.  From  [2-6]  we  know  that  goals  can  emerge  also  as  a 
result  of  planning.  From  [3,4]  we  know  that  the  process  of 
learning  generates  subgoals. 

It  is  presumed  that  any  valued  experience  is  associated 
with  a  certain  measure  of  "goodness".  It  can  be  interpreted  as 
a  reward  for  the  pursuit  of  the  goal  G.  The  value  of  reward 
will  be  used  for  the  subsequent  process  of  hypotheses 
generation  and  selection.  Ultimately,  it  determines  the  rules 
necessary  for  "survival."  A  rule  can  be  obtained  as  a  result  of 
transforming  the  cause-effect  relations  discovered  within 
repetitive  experiences. 

After  the  rules  are  confirmed,  some  of  their  parts  which 
emerged  as  a  result  of  generalization,  may  not  belong  to  the 
initial  vocabularies.  These  parts  are  to  be  treated  as  new 
concepts.  Concept  is  a  label  for  the  entity  which  can  be 
obtained  recursively  as  a  cluster  or  group  of  higher  resolution 
concepts.  All  new  concepts  are  obtained  from  two  available 
classes  of  rules  (see  Figure  2.)  states  and  actions  are  concepts 
and  so  are  clusters  of  states  and  actions,  as  well  as  their 
components. 

I  sl  A12S2J12)     EXPERIENCES  (G) 


RULE  [lF(Si,G) — ■>  THEN  A 12]      RULE  [IF  (Si ,  A12)  — >  THEN  S  2] 
CANDIDATES     FOR     CONCEPT  BASE 


Figure  2.  Experiences — >Rules — >Concepts 
(S-state,  A-action,  J-reward,  G-goal) 

The  new  concept  obtained  from  the  generalized 
experiences  as  a  result  of  their  transformation  into  rules  are 
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novel  words  which  are  not  present  in  the  lists  of  previously 
defined  input  and  output  vocabularies.  These  novel  words  are 
words  of  the  new  vocabularies:  at  the  lower  level  of 
resolution.  They  describe  the  phenomena  related  to  groups  of 
units  of  experiences. 

2.2  The  Algorithm  of  Inductive 
Generalization.  The  following  sequence  of  activities  can 
be  explicated  from  the  definition  of  learning: 

1 .  Experiences  are  collected  and  stored  in  memory. 

2.  Experiences  are  compared,  resemblances  are  determined, 
and  clusters  are  formed  by  the  virtue  of  resemblance. 
Clusters  of  experiences  which  already  have  cause-effect 
relationships,  are  transformed  into  rules,  and  control  rules  and 
events  rules  are  separated. 

3.  Search  for  the  meaningful  cause-effect  relationships  can 
be  done  not  only  among  the  stored  set  of  experiences  but 
among  the  synthesized  experiences  too. 

4.  The  clusters  of  states  and  actions  that  are  parts  of  the 
newly  created  rules  are  stored  as  concepts  of  lower  resolution; 
then  the  growth  of  the  concept  base  begins.  New  concepts 
emerge  as  a  result  of  clustering:  the  new  clusters  are  labeled 
and  receive  a  status  of  a  new  word. 

5.  Each  of  the  clusters  of  similar  enhanced  experiences  is 
considered  to  be  a  candidate  for  becoming  a  rule  hypothesis. 

6.  The  hypotheses  are  stored.  Subsequently,  the  new 
experiences  are  stored  in  parallel  in  two  new  vocabularies:  the 
original  and  the  one  formed  by  the  newly  created  concepts. 

7.  The  process  of  consecutive  operations 

{collection  of  experience — >hypothesis  formation — > 
— >generation  of  rules — >concepts  emergence} 
is  repeated  each  time  as  a  new  experience  arrives. 

8.  Hypotheses  are  validated  by  statistics  of  their  use  and 
then  enter  the  base  of  hypotheses.  As  vocabularies  grow, 
they  allow  to  represent  and  control  functioning  of  the  system 
by  using  their  new  words.  The  sequence  of  steps  1-6  can  be 
repeated  which  forms  a  lower  level  of  representation. 

The  algorithm  (Figure  3)  describes  a  recursive  process 
which  leads  to  a  multiresolutional  (MR)  system  of  world 
representation  and  an  MR  systems  of  rules  of  actions,  both 
acquired  as  a  result  of  learning.  Inductive  generalization 
claims  that  multiple  occurrences  of  similar  experiences 
testify  for  existence  of  a  particular  rule,  if  most  of  these 
occurrences  have  the  same  (or  similar)  explanation  of  causes 
[14].  If  the  number  of  occurrences  is  not  statistically 
persuasive,  then  we  can  talk  about  the  case  of  hypotheses 
generation  by  means  of  abductive  generalization:  In  both 
cases,  it  is  important  to  account  for  the  list  of 
attributes/variables  and  for  the  relations  among  them  [15]. 

Learning  employs  the  algorithm  of  generalization  which 
presumes  a  multiple  iteration  of  the  triplet  from  Figure  1: 
focusing  attention  on  the  subset  of  experiences  and 
searching  among  them  and  their  combinations  until  a 
grouping  can  be  performed,  i.e.  a  cluster  of  similar  units 
could  be  substituted  by  a  single  generalized  hypothesis  [10]. 
If  the  subsequent  experiences  confirm  the  hypothesis,  it 
becomes  stronger.  Automata  with  generalization  have  not 
been  previously  discussed  in  the  literature  on  learning. 

Since   the   algorithm   of  generalization   as  applied 


recursively  to  its  own  results,  each  two  adjacent  levels  of 
resolution  use  generalization  applied  to  the  initial 
information  which  can  be  considered  experiences  and  to 
generalized  information  which  is  called  hypotheses.  Thus,  we 
can  distinguish  two  kinds  of  generalization  pertaining  to  the 
level  to  which  they  are  applied:  generalization  from 
experiences  before  any  hypothesis  is  available  and 
generalization  from  hypotheses. 
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Figure  3.  The  Generalization  Algorithm 

Learning  with  generalization  allows  for  using 
experiences  in  a  very  efficient  way:  it  is  a  tool  of  reducing 
complexity.  In  order  to  receive  innovations,  some 
combinatorics  should  be  introduced  for  generating  words 
beyond  the  existing  experiences.  Similar  mechanism  has 
been  proven  to  be  very  useful  for  design  purposes  [10]. 

2.3  Combinatorial  Enhancement:  Searching 
for  Hidden  Implications.  Each  experience  can  testify 
only  about  some  part  of  the  overall  state.  The  available 
sensor  information  not  necessarily  can  be  a  good  basis  for 
generating  a  hypothesis.  Consider  an  example  with 
autonomous  mobile  robot  which  must  learn  how  to  act  in  a 
particular  environment.  In  a  particular  state,  a  single  actuator 
command  cannot  be  a  proper  response.  Frequently,  a 
combined  command  (steering  +  propulsion)  must  be 
assigned.  Just  value  of  the  sensor  reading  of  angle  of 
"heading",  Zee,  or  value  of  "angle  to  the  goal",  Zo~,  cannot 
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be  an  antecedent  in  a  rule  what  to  do.  However,  their 
difference  (Za  -  Za)  is  a  perfect  variable  of  control. 

The  enhancing  creates  a  new  set  containing  the  initial 
set  and  all  possible  combinations  of  its  elements. 
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Figure  4.  Evolution  of  Acquired  Knowledge 

The  enhanced  states  is  a  set  formed  by  a  state,  all 
possible  combination  of  its  components,  and  all  possible 
combinations  between  components  of  the  action  A  and  the 
state  S. 

Such  two  persuasive  examples  of  learning  as  the 
evolution  of  living  creatures  and  the  evolution  of  knowledge 
would  be  substantially  impaired  if  no  combinatorial 
enhancement  would  be  possible. 

The  following  factors  are  considered  critical  for 
stimulating  evolution  of  living  creatures:  reproduction, 
mutation,  competition,  selection  [9].  These  factors  are 
applied  to  knowledge  in  a  similar  way. 

The  algorithm  of  generalization  creates  new  hypotheses, 
rules,  and  concepts  which  become  new  words  in  vocabularies. 
The  same  algorithm  is  applied  to  its  own  results.  Now  the 
previously  generalized  experiences  are  generalized  again,  and 
the  hypotheses  and  rules  of  "lower  resolution"  are  obtained. 
Evolution  of  acquired  knowledge  is  illustrated  in  Figure  4. 
This  diagram  shows  that  the  joint  system  of  knowledge 
representation  and  behavior  generation  converges  to  a 
multiresolutional  one. 

2.4  Searching  for  Valid  Hypotheses  among 
Clusters.  The  reason  behind  the  creation  of  clusters  is  our 
belief  that  each  of  these  clusters  is  a  candidate  to  become  a 
hypothesis  of  a  rule  of  a  different  behavior  [7,  8]. 

The  hypotheses  are  then  stored  in  the  database  of 
hypotheses  as  a  tree  where  each  hypothesis  is  related  to  its 
"parent"  by  the  goal.  If  no  hypothesis  was  found  for  a  certain 
state,  then  the  state  in  the  hypothesis  with  closest  state  to 
ours  becomes  a  subgoal.  This  is  one  more  source  for  the 
emerging  hierarchies  of  acquired  knowledge. 

2.5  Combinatorics  of  refinement.  When 
hypotheses  are  sought  for  within  the  level  of  higher 
resolution,  the  objects  and  experiences  are  supposed  to  be 
"enhanced"  combinatorially  in  the  way  similar  to  one  used  at 


a  level.  This  enhancement  is  used  for  the  problems  of  design. 
It  is  a  part  of  the  process  of  "task  decomposition  [3,  6]." 

3.  Behavior  Generation  as  a 
Multiresolutional  Search  for  a  Motion 
Trajectory  in  the  State  Space 

The  second  process  characteristic  for  the  learning 
automaton  is  Behavior  Generation  (BG)  (see  [4,16-20].)  This 
process  is  understood  as  a  sequence  of  top-down  planning 
activities  which  end  with  receiving  a  set  of  control 
commands.  The  purpose  of  learning  is  to  enable  the 
subsequent  process  of  planning.  Conventional  automata  are 
only  capable  of  reactive  decisions;  they  are  not  capable  of 
"look-ahead"  decision  making  processes  which  are  typical  for 
deliberative  planning. 

Behavior  generation  consists  of  two  components:  goal 
refinement  and  trajectory  generation.  The  goal  is  given  from 
the  upper  level  at  a  particular  resolution.  In  order  to  plan  the 
trajectory,  this  representation  of  goal  should  be  refined  ("job 
assignment".)  Then,  "each  member  of  the  team"  can  plan  its 
trajectory  ("scheduling.")  Unlike  learning,  which  develops 
bottom-up  and  works  via  generalization  (or  coarsening),  the 
process  of  planning  develops  top-down  and  works  via 
instantiation  (or  refinement.)  The  process  of  planning  is 
determined  as  choosing  the  desirable  behavior  by  anticipating 
admissible  alternatives  among  possible  behaviors  and 
selecting  the  best  of  them. 

From  Section  2,  we  conclude  that  all  experiences 
acquired  and  hypotheses  generated  contain  some  knowledge  of 
some  reactive  rules.  For  example,  "if  it  is  necessary  to  get  to 
S2  from  S j,  apply  Aj2-"  This  rule  reacts  by  evoking  A^  to 

the  need  of  getting  into  S2  from  S  j  at  i  level  of  resolution. 

Trajectory  is  a  string  of  adjacent  admissible  elementary 
subdomains  (or  tessellata,  tiles  of  the  discretized  space). 

Any  well  posed  problem  of  planning  should  start  with 
assigning  the  initial  point  and  the  final  point  of  the 
trajectory  which  should  be  determined  at  the  higher  resolution 
within  the  tessellatum  of  the  resolution  under  consideration. 
The  feasible  trajectory  is  determined  at  the  lower  level  of 
resolution. 

PT  is  called  a  feasible  trajectory  if  it  has  an  initial  and  a 
final  points,  and  all  tiles  of  the  string  are  contained  in  the 
feasible  trajectory  at  the  lower  level  of  resolution.  Thus,  a 
feasible  trajectory  for  the  level  i  is  always  represented  as  a 
string  of  tiles  for  the  i+1  level  of  resolution  and  is  a 
subspace  of  the  i  level  of  resolution.  Clearly,  the  needs  in 
and  the  algorithms  of  finding  a  feasible  trajectory  are  tools  of 
focusing  attention.  We  determine  the  subspace  in  which  the 
wpp  of  planning  should  be  resolved,  and  the  optimum 
trajectory  should  be  found.  Indeed,  the  feasible  trajectory 
determined  at  the  i+1  level  of  resolution  becomes  an 
"envelope,"  a  bound  domain  of  space  at  the  i  level  of 

resolution.  Search  in  the  state  space  or  S  -search,  (see  [8,  14- 
20])  is  done  by  synthesizing  the  feasible  trajectories  for  a 
particular  level  of  resolution  and  then  building  the 
alternatives  of  possible  motion  trajectories  for  the  lower 
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resolution  level  within  the  envelope  cost  space. 


Figure  5.  Illustration  to  the  MSJ  -algorithm. 

Some  particular  volume  of  the  state  spaces  designated 
for  a  subsequent  search  for  a  solution,  Operation  of 
contraction  puts  constraints  on  this  volume  and  should  be 
properly  justified.  We  need  to  reduce  the  probability  that 
contraction  eliminates  some  or  all  of  the  opportunities  to 
find  the  optimum  path  trajectory.  The  following  heuristic 
strategy  of  contraction  is  chosen.  After  the  search  at  the 
lowest  resolution  level  is  performed,  the  optimum 
trajectory  is  surrounded  by  an  envelope.  It  is  a  convex  hull 
which  has  a  width  w  determined  by  the  context  of  the 
problem.  Then,  the  random  points  generation  at  the  next 
level  of  resolution  is  performed  only  within  this  envelope 
of  search.  The  problem  of  consistency  of  representation 
under  the  contraction  heuristic  has  to  be  addressed  in  the 
future. 

The  Behavior  Generation  part  of  LPA  functioning  is 
illustrated  in  Figures  5  and  6. 

4.  Conclusions:  The  Issues  of  Further 
Research 

The  results  presented  in  this  paper  has  been 
confirmed  by  the  experimental  results  [1,7,8].  "Baby-Robot" 
was  able  to  learn  how  to  reach  the  arbitrary  situated  goal 
only  after  implementing  the  algorithm  of  generalization  with 
combinatorial  enhancement.  Before  this  algorithm  was 
implemented,  Baby-Robot  was  able  to  learn  how  to  reach  the 
particular  located  goal.  If  the  location  of  the  goal  was 
changed,  the  successful  learning  process  for  the  previous 
goal  could  not  help  to  find  a  new  one.  Generalization  n  with 
combinatorial  enhancement  has  enabled  the  robot  to  make 
the  discovery,  and  initiate  the  process  of  hierarchical 
learning.  Other  positive  results  are  recorded  in  [19,20]. 

1 .  In  this  paper,  we  introduce  and  analyze  an  algorithm 
of  MR  unsupervised  learning  with  inductive  generalization 


and  a  search  for  hidden  implications.  This  algorithm  is 
applied  recursively  to  its  own  results  at  the  output.  Thus, 
LPA  develops  an  evolving  MR  system  of  representation.  It 
enables  the  system  of  behavior  generation  (BG)  also  to 
evolve.  LPA  provides  for  an  evolution  of  the  automata 
equipped  with  such  systems.  LPA  becomes  an  MR 
automaton,  and  its  levels  of  resolution  can  change  as  the 
evolution  of  knowledge  and  behavior  proceeds.  This 
evolution  can  be  illustrated  by  Figure  7. 

2.  From  Figure  8,  one  can  see  the  behavior  evolves. 
This  producing  different  plans  and  motion  trajectories  as 
shown  as  a  horizontally  developing  tree.  At  the  same  time, 
its  knowledge  evolves  as  shown  in  the  vertical  hierarchical 
structures. 

This  process  seems  to  be  even  more  important  to 
analyze  of  the  evolution  of  living  creatures.  We  believe  that 
LPA  allows  for  analysis  of  the  processes  of  evolution  of  its 
systems  as  species.  It  is  possible  to  equip  the  automaton  by 
the  system  of  reproduction.  It  would  be  possible  to  analyze 
how  the  process  of  knowledge  evolution  is  affected  by 
different  mechanisms  of  reproduction. 

3.  This  line  of  research  takes  advantage  of  the 
uniqueness  of  LPA  among  other  known  systems  of  automata 
with  learning.  The  mechanism  of  unsupervised  learning 
allows  for  the  ultimate  freedom  in  the  way  the  learning 
process  organizes  the  acquired  knowledge.  It  is  possible  to 
anticipate  that  as  the  knowledge  base  evolves,  the  knowledge 
becomes  utterly  diversified.  Rules  concerning  the  external 
world  will  emerge,  and  the  rules  concerning  processes  of 
inner  knowledge  organization  and  procedures  of  processing 
will  follow. 

4.  LPA  can  be  used  to  analyze  all  stages  of  learning 
including  the  "early  learning"  stage.  Certainly,  some  initial 
knowledge  ("bootstrap  knowledge")  is  presumed.  This 
bootstrap  organization  of  knowledge  can  strongly  affect  the 
subsequent  processes  of  knowledge  evolution.  On  the  other 
hand,  the  learning  system  is  presumed  to  be  free  of  a  building 
up  of  all  subsequent  knowledge  organization.  How  it  will 
organize  the  knowledge  acquired  and  why? —  this  is  the 
research  issue  for  LPA. 

5.  The  processes  of  knowledge  acquisition  are  affected 
by  the  knowledge  stored.  They  start  creating  some  bias  in  the 
subsequent  knowledge  acquisition  since  the  results  of 
automaton  functioning  will  be  induced  by  the  knowledge 
previously  stored.  So,  if  the  results  of  functioning  were 
"good"  or  led  to  a  "better"  behavior,  the  system  might 
assume  that  its  goodness  is  due  to  the  knowledge  used.  It 
might  happen  that  the  experiences  the  system  acquires  are 
limited  by  its  predisposition. 

6.  Since  the  system  has  been  developed  to  demonstrate 
some  particular  behavior,  the  evaluation  of  this  behavior 
should  be  the  ultimate  measure  for  the  process  and  results  of 
knowledge  organization  as  well  as  the  processes  and  results 
of  the  ways  knowledge  is  acquired  from  the  external  world 
and  knowledge  used  for  behavior  generation.  This  determines 
an  interesting  interconnection  between  further  developing  the 
hierarchy  of  functions  for  evaluation  of  goodness  which 
precipitate     at     different     levels     of    resolution.  The 
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interconnection  between  learning,  behavior,  and  emerging 
"system  of  values"  in  LPA  is  an  important  research  issue. 


TIME 


Figure  7. Evolution  of  Knowledge  and  Behavior  of  LPA 

7.  All  these  three  processes:  acquisition,  organization 
and  use  affect  the  overall  system  functioning.  Therefore, 
other  systems  of  learning  can  modify  and  alter  the  system  of 
LPA.  At  the  present  time  the  following  mechanisms  of 
knowledge  acquisition  are  known  from  the  literature: 

a)  .  Learning  by  transfer.  In  this  case  all  knowledge 
which  subsequently  is  required  for  behavior  generation  is 
transferred  from  another  source  where  it  was  stored  and 
organized  in  advance  based  upon  existing  design  decisions  and 
experiences  of  functioning.  This  method  of  knowledge 
acquisition  presumes  that  the  structure  of  the  system  of 
interest  and  its  functioning  in  required  circumstances  are 
previously  known. 

b)  .  Learning  by  examples.  In  this  case,  we 
presume  a  "Teacher"  which  has  substantial  knowledge  about 
my  cases  of  possible  functioning,  stores  knowledge  of 
previous  experiences  and  spells  out  a  set  of  possible 
scenarios  in  which  the  functioning  of  the  system  is  expected. 
Undoubtedly,  a  set  of  tests  can  be  developed  in  which  a 
behavior  of  system  is  entertained  and  after  each  case  of 
behavior  the  system  receives  the  teacher's  evaluation 
whether  it  was  good,  and  how  good  it  was. 

The  difference  between  these  systems  and  LPA  can  be 
easily  seen. 

8.  Learning  and  Behavior  Generation  produce 
structural  and  behavioral  hierarchies.  It  is  possible  to  state 
that  using  LPA  reduces  computational  complexity  by 
increasing  the  structural  complexity. 
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Abstract 

This  paper  is  concerned  with  probability  density  esti- 
mation in  high-dimensional  settings.  A  breakthrough 
in  classifier  performance  has  been  achieved  by  as- 
suming that  the  multidimensional  feature  vector  is 
composed  of  statistically  independent  "groups" .  The 
method,  which  we  call  independence  grouping,  de- 
termines a  mapping  of  features  into  separate  low- 
dimensional  feature  groups,  which  are  regarded  as  in- 
dependent. The  optimal  size  of  the  groups  and  the 
best  mapping  is  determined  either  by  large  numbers 
of  random  trials,  or  by  pairwise  test  for  mutual  in- 
dependence of  features.  The  bias  associated  with  the 
independence  assumption  is  traded  off  with  large  im- 
provements in  PDF  estimation  from  the  dimensional- 
ity reduction. 

1  Introduction 

In  many  classification  problems  of  current  interest, 
no  concise  statistical  model  is  available  the  compet- 
ing hypotheses.  Example:  a  microphone  is  placed  at 
a  busy  street  intersection,  the  noises  received  at  the 
microphone  are  to  be  classified.  There  are  two  choices 
for  such  difficult  problems: 

1.  Employ  a  single  general  model  with  a  large  num- 
ber of  unknown  parameters  (such  as  Neural  Net 
or  Kernel-based  PDF  estimator). 

2.  Employ  several  low-dimensional  models,  each 
specific  to  a  hypothesis. 

Either  way,  a  large  number  of  parameters  (features) 
must  be  estimated  and  used  jointly  for  classification. 
In  such  problems,  the  curse  of  dimensionality  strikes 


causing  the  amount  of  data  required  to  train  the  clas- 
sifier to  rise  exponentially,  far  outstripping  the  ability 
to  collect,  store  or  process. 

Many  researchers  hypothesize  that  many  natu- 
ral phenomana  are  inherently  low-dimensional,  but 
in  some  unknown  subspace  or  manifold  of  the 
high-dimensional  feature  space.  Learning  this  low- 
dimensional  structure  we  call  structural  learing. 

In  this  paper,  we  expose  a  novel  method  of  struc- 
tural learning  that  has  achieved  a  breakthrough  in 
classifier  performance.  The  method,  which  we  call 
independence  grouping,  determines  a  mapping  of  fea- 
tures into  separate  low-dimensional  feature  groups, 
which  are  regarded  as  independent.  The  best  map- 
ping is  determined  either  empirically  by  large  num- 
bers of  trials,  or  by  direct  determination  of  the  de- 
gree of  independence.  The  bias  associated  with  the 
independence  assumption  is  offset  with  large  improve- 
ments from  the  dimensionality  reduction. 

Because  of  practical  considerations  (computer  and 
time  resources)  many  of  the  results  presented  in  this 
paper  were  obtained  from  different  data  sets.  How- 
ever, the  results  are  presented  in  the  order  in  which 
they  were  obtained  along  with  the  thinking  that  led 
to  them.  The  author  hopes  that  this  style  this  will 
result  in  a  very  readable  presentation. 

2    Independence  Grouping 

Let  there  be  N  features  organized  into  a  vector  x  = 
{xi,X2, . . .  ,xn}-  The  full-dimensional  (FD)  classifier 
for  M  hypotheses, 

max    p(x\Hj)  p(Hj) 
j=l,.,.,M    v       JJ    v  J/ 

makes  use  of  the  FD  PDF  estimate  p(x\Hj)  which 
is  derived  from  some  training  data.   We  use  K  to 
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denote  sample  size  and  superscripts  to  denote  sample 
number:,  {x1, . . .  ,XK}. 

Let  the  N  features  be  divided  into  G  groups,  each 
group  having  Kj  features,  thus  N  =  ^2j=1  Kj.  The 
feature  mapping  are  denoted  by  Pi  G  [l,G],  i  = 
1,...N  which  maps  each  feature  Xi  into  a  group. 

Note  that  that  G  =  1,  Pi  =  1  for  i  =  1, . . . ,  N  is  the 
mapping  for  the  FD  PDF. 

For  each  mapping  (3  =  {Pi, . . .  ,Pn},  the 
independence-grouped  (IG)  PDF  for  x  is  constructed 
as  a  product  of  group-PDF's. 

p(x\P,Hj)    =    p({xi  :  Pi  =  l}\Hj) 

■p({Xi  :  Pi  =  2}\Hj)  (1) 

...p({xi:pi  =  G}\Hj) 

The  above  expression  carries  with  it  an  implication 
that  the  groups  are  statistically  independent  of  one 
another.  Note  that  (1)  must  be  trained  on  a  training 
data  set  for  each  j.  Training  requires  a  double  max- 
imization/optimization: optimize  jointly  over  (3  and 
the  parameters  of  the  PDF  estimator  for  the  data 
set  available.  When  training  the  lower-dimensional 
PDF's,  less  data  is  required.  Or,  with  a  fixed  train- 
ing data  size,  they  will  be  more  accurate  than  the  FD 
counterpart.  But,  is  the  increase  in  accuracy  enough 
to  counter  the  bias  associated  with  the  assumption 
of  independence?  In  the  cases  we  have  studied,  the 
implicit  assumption  of  independence  among  groups 
simply  is  not  true  -  there  always  seems  to  be  statis- 
tical dependence  between  any  two  features.  We  now 
argue  that  the  goodness  this  assumption  can  be  di- 
rectly tested. 

Note  that  (1)  is  a  PDF  in  its  own  right  on  the  same 
feature  space  and  may  be  directly  compared  with  the 
FD  classifier  using  likelihood  value.  The  apropriate- 
ness  of  the  independence  assumption  will  be  reflected 
in  the  total  log-likelihood  of  the  PDF  estimate  obtained 
under  the  assumption.  There  is  no  need  for  guesswork 
(note  that  we  assume  here  that  the  total  log-likelihood 
"score"  of  the  PDF  estimate  must  be  obtained  from 
a  statistically  independent  set  of  data  as  was  used  for 
training  or  else  "overtraining"  occurs). 

We  claim  that  the  high  dimensional  PDF,  when 
optimized  to  fit  a  limited  set  of  training  data  may 
be  making  far  worse  compromises  in  the  process  of 
training,  although  these  compromises  are  hidden  from 
view  in  the  vast  reaches  of  TZN .  Should  (1)  provide 
better  performance  than  the  FD  counterpart,  then 
there  is  no  sane  reason  to  not  accept  it.  After  all, 
recall  that  the  FD  classifier  is  contained  in  the  set  of 
all  groupings. 


3  PDF  Estimation 

In  this  paper,  estimates  of  PDF's  for  FD  or  IG  PDF's 
are  derived  from  training  data  using  a  standard  multi- 
variate PDF  estimation  approach  called  heteroscedas- 
tic  Gaussian  mixture  (GM)  modeling.  A  widely  ac- 
cepted technique  for  estimating  the  parameters  of  the 
GM  model  is  the  EM  algorithm  [1],[2].  The  EM  al- 
gorithm suffers  from  numerical  problems  when  there 
is  insufficient  data  leading  some  researchers  to  avoid 
it  [3]  or  constrain  the  covariances  of  the  kernels  to 
be  identical  [4],  or  of  uniform  size  with  variable  rota- 
tion [5] .  Adding  to  the  covariance  estimates  based  on 
a  Bayesian  prior  density  argument  is  the  preferred 
method  of  dealing  with  the  problem  [6],  [7].  We 
have  obtained  excellent  results  using  Bayesian  priors 
to  represent  an  implied  assumption  of  measurement 
noise.  Measurement  noise  may  represent  quantization 
noise  or  subjectively  determined  error  variances. 

4  Optimum  Group  Size 

We  have  not  mentioned  how  G  or  kj  are  determined. 
We  start  by  selecting  G  and  kj  empirically.  We  will 
shortly  see  that  for  a  given  problem,  algorithm  per- 
formance has  a  broad  peak  at  a  certain  group-size, 
say  k* . 

In  the  following  experiment,  a  data  set  of  time  se- 
ries "snapshots"  from  M  =  6  signal  types  was  avail- 
able. The  data  was  divided  into  5  sets  containing 
different  amounts  of  additive  noise  to  artificially  pro- 
duce a  variation  of  SNR  in  3  dB  increments.  A  total 
of  N  =  44  features  were  extracted  from  each  time  se- 
ries. An  average  of  about  150  samples  was  available 
from  each  class  at  each  SNR  for  training  and  testing. 

The  group  sizes  were  fixed  and  (3  was  common 
across  hypotheses.  For  each  /3,  PDF's  were  estimated 
on  the  groups  from  training  data,  the  combined  PDF 
(1)  was  formed,  and  classification  performance  (prob- 
ability of  correct  classification,  Pcc)  was  measured  us- 
ing a  separate  testing  data  set.  This  was  repeated  for 
a  range  of  group  sizes.  When  we  speak  of  trials,  we 
refer  to  independent  random  data  splits  into  train- 
ing and  testing  halves.  The  Pcc  was  averaged  across 
trials.  Pcc  was  measured  by  counting  the  number  of 
correct  decisions  as  a  fraction  of  the  total  number  of 
data  samples  used  in  testing. 

The  44  features  were  grouped  into  sets  of 
2, 3,4, 5,6, 7, 9, 11, 17, and  21  features.  The  last  group 
contained  the  remainder  number  of  features  when  44 
was  perfectly  divisible.  A  total  of  16  statistical  trials 
were  run  at  each  SNR  and  for  each  group  size.  In 
each  trial,  not  only  was  the  data  randomly  split,  but 
the  features  were  randomly  assigned  to  groups. 
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It  was  expected  that  using  very  small  groups  would 
induce  poor  performance  due  to  the  (incorrect)  as- 
sumption of  independence  between  groups.  For  larger 
groups,  the  dimensionality  issues  would  dominate. 
The  net  result  would  be  an  optimal  dimension.  The 
experiment  was  repeated  for  various  levels  of  additive 
Gaussian  noise  (SNR  reductions)  in  3  dB  steps.  This 
expectation  is  very  clearly  supported  in  Figure  1.  The 
group  size  (dimension  P)  is  plotted  on  the  abscissa. 


Pec  vs.  P  at  various  SNRS  (in  3  dB  steps) 
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Figure  1:  Classification  performance  as  a  function  of 
feature  group  size,  average  of  16  random  trials. 

The  importance  of  Figure  1  cannot  be  overstated. 
Conventional  thinking  in  classification  has  been  to  in- 
crease N  using  the  FD  classifier  until  a  collapse  in  pe- 
formance  is  noted  due  to  dimensionality  issues.  Here, 
we  keep  TV  fixed  but  are  able  to  adjust  the  dimen- 
sion nevertheless  by  adjusting  group  size.  The  per- 
formance peak  at  P  ~  5  shown  in  Figure  1  cannot  be 
found  by  existing  methods  of  model  order  selection 
which  do  not  utilize  all  the  features  at  once. 

5    Best  Grouping 

The  limitations  of  the  previous  experiment  are  that 
no  attempt  was  made  to  find  an  optimal  gouping, 
which  should  be  different  for  each  class  hypothesis. 
However,  it  still  clearly  indicates  the  existence  of  a 
broad  maximum  for  group  size  around  k*  =  5. 

The  following  experiment  is  a  variation  of  the  pre- 
vious M  =  6  experiment.  A  total  of  N  =  64  features 
were  used  in  the  random  grouping  scheme.  Random 
group  sizes  in  the  range  of  3  to  6  in  length  were  used. 
As  before,  the  groupings  were  held  constant  across 
hypotheses.  For  each  grouping,  Pcc  was  averaged 
across  SNR.  A  total  of  1800  groupings  were  tested 
on  the  CRAY  T3D  at  NUWC  in  Newport,  RI.  The 
best  grouping  of  the  1800  was  selected.  Multiple  tri- 
als were  then  run  using  this  "best  grouping"  to  more 


accurately  measure  its  performance. 

When  measuring  Pcc,  it  was  necessary  to  add  a  con- 
stant to  each  log-PDF  (equivalent  to  adjusting  prior 
probabilities  of  the  hypotheses  in  a  Bayesian  frame- 
work). The  6  constants  were  adjusted  to  minimize 
the  total  number  of  errors.  This  optimization  was  re- 
peated for  each  trial.  The  grouping  with  the  best  av- 
erage performance  was  then  used  in  16  trials  at  each 
SNR  step.  In  Figure  2,  the  resulting  performance 
of  the  best  grouping  is  plotted  in  which  RG  stands 
for  random  grouping.  This  graph  shows  the  Pcc  per- 
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Figure  2:  Classification  performance  vs.  SNR  of  the 
best  grouping  out  of  1800  trials  for  IG  PDF  (denoted 
RG)  vs.  three  other  classifiers  (see  text). 

formance  as  a  function  of  SNR  for  the  best  random 
grouping  as  compared  with  three  other  classifiers: 

1.  GM48:  Full-dimensional  classifier  for  44  features 
using  48  mixture  components. 

2.  KNN(A;  =  3):  K-nearest  neighbor  classifier  (A;  = 
3). 

3.  GM1:  Full-dimensional  classifier  for  44  fea- 
tures using  1  mixture  components,  essentially  a 
quadratic  Bayesian  classifier. 

For  each  data  point  on  the  graph  represents  an  aver- 
age of  16  trials,  each  using  a  different  random  data 
split  of  the  same  overall  data  set.  The  IG  classifier 
clarly  outperforms  the  other  classifiers. 

5.1    Measure  of  Fit  (MOF) 

Up  to  now,  we  have  optimized  the  feature  grouping 
by  maximizing  estimated  Pcc.  This  has  two  disadvan- 
tages: 

1.  It  requires  a  lot  of  computation  each  time  Pcc  is 
estimated.  This  limits  the  number  of  combina- 
tions that  can  be  tried. 
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2.  It  constrains  each  class  to  have  the  same  grouping 
(it  would  not  be  practical  otherwise). 

In  an  attempt  to  avoid  these  problems,  alternative 
ways  were  sought  to  optimize  the  groupings. 

Consider  an  algorithm  for  training  a  Gaussian  Mix- 
ture estimate  of  p(x\Hj).  Such  an  algorithm  operates 
my  maximizing  the  average  or  total  log-likelihood  of 
the  data  \ogp(xk).  But,  as  mentioned  earlier,  (1) 
is  a  PDF  on  TZN  in  its  own  right.  There  is  no  reason 
not  to  jointly  optimize  the  groupings  at  the  same  time 
as  the  parameters  of  the  Gaussian  Mixture  based  on 
total  likelihood.  This  incorporates  grouping  selection 
(structure  learning)  directly  in  the  PDF  estimation  of 
a  given  class.  Thus,  grouping  selection  is  separated 
from  classification  performance.  More  grouping  tri- 
als could  be  tested  per  class  for  a  given  amount  of 
computer  resources.  In  addition,  differences  in  the 
data  structure  of  different  classes  could  be  taken  into 
account. 

In  Figure  3,  we  show  32  histograms.  Each  his- 
togram is  created  from  37  random  groupings  for  a 
given  random  data  split.  It  may  be  seen  that  the  dis- 
tribution of  the  average  likelihood  depends  greatly  on 
the  data  split.  It  is  theorized  that  some  data  splits 
contain  data  points  that  are  likely  to  be  "alone" ,  with- 
out neighbors  and  not  likely  to  be  "covered"  by  PDF 
mixture  kernels,  which  tends  to  bias  the  likelihood 
value  down.  Thus,  the  determination  of  the  "best 
grouping"  overall  had  to  factor  in  the  bias  of  each 
random  data  split. 

The  best  grouping  overall  for  each  class  hypoth- 
esis was  selected.  An  improvement  in  classification 
performance  was  noted,  however,  it  was  not  signif- 
icant. The  author  believes  that  this  is  due  to  the 
high-dimensional  space  in  which  to  search  for  the  best 
/3.  In  the  following,  we  break  through  this  barrier. 

5.2    Independence  Measure 

If  the  implied  assumption  of  independence  groupings 
is  the  statistical  independence  between  groups,  then 
it  makes  sense  to  obtain  a  measure  of  feature  inter- 
dependence. Unfortunately,  for  practical  reasons,  we 
must  limit  ourselves  to  pair- wise  tests.  We  are  not 
interested  is  simple  correlation  coefficients  since  this 
is  not  descriptive  enough.  Uncorrelated  random  vari- 
ables can  be  statistically  dependent.  The  scheme  we 
use  is  based  on  a  hypothesis  test  for  independence 
vs.  dependence.  More  specifically,  the  independence 
hypothesis  says  that 

p{Xi,Xj)  =p(Xi)p{Xj) 

The  procedure  is  then  to  estimate  the  three  PDF's 
p(xi,Xj),  p(xi),  p(xj)  from  available  data,  then  form 
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Figure  3:  Histograms  of  random  Groupings  for  32 
different  random  splits 

the  empirical  independence  measure 
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where  K  is  the  number  of  available  training  samples 
and  the  superscript  indicates  the  sample  index. 

This  measure  was  computed  for  all  feature  pairs 
in  a  70-feature  example.  The  results  are  provided  in 
Figure  4.  In  the  graphic,  feature  dependence  measure 
is  scaled  and  quantized  to  a  number  between  0  and 
15,  represented  as  a  hexadecimal  character.  Anything 
below  3  is  shown  as  These  types  of  graphics  are 
then  used  to  manually  or  automatically  determine  the 
best  feature  groupings. 

The  following  procedure  was  found  to  be  satisfac- 
tory: . 

1.  Determine  what  the  largest  group  size  that  can 
be  accepted. 

2.  Any  two  features  with  a  dependence  measure 
greater  than  a  threshold  r  are  placed  in  the  same 
group. 

3.  Begin  with  r  equal  to  some  large  value. 

4.  Lower  r  until  the  largest  group  size  has  been 
reached. 

5.  Continue  to  lower  r,  grouping  together  features, 
but  do  not  add  to  groups  that  have  exceeded  the 
maximum  group  size. 
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Figure  4:  Graphic  of  feature  interdependence 


6.  Terminate  when  r  reaches  some  minimum  limit. 

This  procedure  can  be  automated.  The  graphic  in 
Figure  4  produced  the  grouping  in  Table  1.  The 
reader  may  notice  that  the  largest  group  size  is 
11.  Many  of  the  features  in  the  largest  group  were 
highly  dependent  and  could  not  be  justifiably  sep- 
arated. Furthermore,  highly  dependent  (or  corre- 
lated) features  could  be  grouped  together  without 
greatly  increasing  the  ultimate  voulme  of  the  space. 
Thus,  groups  with  highly  dependent  features  could  be 
larger. 


5.3    Results  of  Groupings  based  on 
Independence- Measure 

Consider  the  second  (from  top)  histogram  in  Figure 
3  (trial  number  31  of  32).  We  expand  this  histogram 
in  Figure  5.  In  addition,  a  single  point  is  added  at 
the  far  right.  This  point  is  the  likelihood  value  of  the 
PDF  using  the  grouping  determined  from  pair-wise 
independence  measures  and  tested  with  exactly  the 
same  data  split  as  the  rest.  The  fact  that  this  point 
is  far  outside  the  population  of  the  randomly  selected 
groupings  graphically  illustrates  how  significant  the 
independence  measure  method  is  with  respect  to  ran- 
dom grouping  selection. 


Group 

Features 

1 

13  14  15  23  25  27  28  43  46  24  29 

2 

5  8  40  21  22 

3 

19  20  34  18  31 

4 

55  56  57  58  59 

5 

67  68  69 

6 

4  45  3 

7 

6  10  44 

8 

26  30  17 

9 

37  38 

10 

41  42 

11 

7 

12 

11 

13 

12 

14 

16 

15 

32 

16 

33 

17 

36 

18 

39 

19 

47 

20 

48 

21 

49 

22 

51 

23 

52 

Table  1:  Feature  grouping  corresponding  to  Figure  4 

5.4    Genetic  Algorithm  for  Grouping 
Optimization 

Neither  random  search  nor  search  based  on  pair-wise 
independence  measure  are  expected  to  locate  the  best 
grouping.  Since  the  search  is  on  a  discrete  space, 
gradient-based  methods  cannot  be  employed.  How- 
ever, a  genetic  algorithm  is  well  suited  for  the  task. 
In  the  preliminary  implementation,  the  following  ap- 
proach was  employed.  A  starting  point  was  selected 
based  on  random  search  or  independence  measure. 
Then,  a  mutatiuon  was  performed  by  applying  one  of 
three  possible  changes: 

•  Swap  2  features. 

•  Move  1  feature. 

•  Merge  2  groups. 

All  operations  used  a  uniformly  distributed  random 
variable  to  select  the  group  or  feature.  If  a  feature 
was  moved  outside  the  range  of  existing  groups,  a  new 
group  was  created.  Approximatelty  1000  single  muta- 
tions were  performed  and  the  MOF  was  determined. 
The  8  mutations  which  produced  the  largest  change  in 
MOF  with  respect  to  the  original  parent  for  the  same 
random  split  were  selected.  These  8  children  were 
then  used  as  parents  (asexually)  for  the  next  genera- 
tion. The  MOF  was  always  calculated  on  testing  data, 
i.e.  unseen  by  the  PDF  training  algorithm.  A  graphic 
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Average  likelihood 

Figure  5:  Histogram  of  37  random  Groupings  using 
same  data  split 

showing  a  histogram  of  the  MOF  of  each  generation 
is  shown  in  Figure  6.  Beginning  from  the  bottom,  the 
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Figure  6:  Five  generations  of  Genetic  Algorithm 

MOF  histogram  for  the  parent  "hand-seleted"  group- 
ing determined  from  pair-wise  independence  is  shown 
along  with  a  single  data  point  on  the  right  indicat- 
ing the  MOF  for  the  training  data.  Loosely  speaking, 
this  represents  an  upper  limit  of  achievable  MOF  for 
testing  data.  Next,  the  MOF  histograms  of  gener- 
ations 1  through  5  are  shown.  The  smoothness  of 
these  histograms  reflects  a  large  number  of  groupings 
included  in  contrast  to  the  single  parent  grouping. 
Note  a  shift  to  the  right  with  each  generation.  The 
best  grouping  of  generation  5  was  seleted.  The  next 


histogram,  dennoted  "Gen  6"  is  its  MOF  histogram. 
Note  that  again  we  include  the  "uppper  limit"  point, 
the  average  MOF  for  training  data.  The  final  his- 
togram of  the  FD  PDF  (at  top)  provides  a  contrast 
that  clearly  illustrates  the  characteristic  of  IG  PDF's. 
Note  that  the  average  likelihood  for  the  training  data 
(single  points  at  right)  is  higher  for  FD  vs.  IG.  This 
is  expected  since  it  reflects  the  fact  that  the  IG  PDF 
is  more  constrained  than  the  FD  PDF  and  cannot  fit 
the  training  data  as  well.  However,  when  testing  data 
is  used,  the  more  robust  IG  PDF's  far  outperform  the 
FD  PDF. 

6    Conclusions  and  Future  Work 

An  entirely  new  dimension  has  been  added  to  the 
PDF  training  phase  of  classifiers  (or  any  PDF-based 
algorithm).  Structural  learning  finds  the  best  struc- 
ture upon  which  to  train  the  PDF.  While  PDF  es- 
timation must  be  performed  by  matching  the  PDF 
parameters  to  best  fit  the  given  training  data,  struc- 
tural learning  must  be  performed  by  fitting  to  the 
testing  data,  unseen  by  the  PDF  estimator.  This  en- 
courages the  combined  structure-PDF  estimate  to  be 
robust  while  at  the  same  time  optimizing  the  measure 
of  fit. 
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LEARNING  FROM  LARGE  DIMENSIONAL,  SMALL  SAMPLE  DATA  BY  A 
POTENTIAL  FUNCTION  METHOD  WITH  NONLINEAR  TRANSFORMATION:  A 

CANCER  DIAGNOSTIC  STUDY 
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1  Introduction 

Machine  learning  methods  are  especially  well  suited 
to  medical  diagnostic  applications.  One  of  the  ap- 
plication is  the  screening  and  diagnosis  of  cancerous 
and  pre-cancerous  conditions.  In  the  present  pa- 
per we  discuss  novel  techniques  of  machine  learning 
applied  to  the  analysis  of  data  from  a  new  opti- 
cal biopsy  device  where  the  dimensionality  of  each 
original  observation  of  the  data  is  very  large  (60,  000 
items)  and  where  the  number  of  observations  is  very 
small  (approximately  one  hundred). 

We  have  analyzed  data  collected  by  the 
ColpoProbe  device  (MediSpectra,  Inc.,  Cambridge, 
MA)  at  the  colposcopy  clinic  of  the  Beth  Israel  Hos- 
pital, Boston  [L.Burke  et  al,  1996].  It  is  widely  ac- 
cepted that  Pap  Smear  testing  has  been  success- 
ful in  reducing  cervical  cancer  in  the  U.S.  by  more 
than  70%.  It  is,  however,  also  recognized  that  the 
methods  are  far  from  ideal  and  not  cost  effective. 
Each  year,  50  million  women  in  the  U.S.  receive  a 
Pap  Smear  test  and  of  those,  approximately  6%  are 
returned  with  a  finding  of  ASCUS  (Atypical  Squa- 
mous Cells  of  Undetermined  Significance).  This  di- 
agnosis represents  minor  Pap  Smear  abnormalities 
that  are  not  sufficiently  clear-cut  to  permit  a  more 
specific  diagnosis.  Because  of  obvious  prognostic 
concerns,  this  population  requires  thorough  evalua- 
tion to  determine  whether  benign  or  malignant  pro- 
cesses are  more  likely. 

The  triage  of  patients  with  ASCUS  results  from 
Pap  smears  is  a  growing  problem.  Currently,  pa- 
tients are  either  referred  for  repeat  Pap  smears  or 
directly  to  colposcopy.  There  have  been  several 
studies  in  the  literature  on  the  use  of  Pap  smears 
for  triaging  non-negative  Pap  smears.  These  show 
that  the  sensitivity  of  the  Pap  smear  when  used  as 
a  follow-up  is  typically  between  40  and  62%  (Slaw- 
son,  D.C.,  et.  al.,  1994;  Slawson,  D.C.,  et.  al.,  1993; 
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Xiao- Wei  Sun,  et.  al.,  1995).  This  results  in  a  signif- 
icant number  of  patients  with  ASCUS  Pap  smears 
not  being  referred  to  colposcopy.  Additionally,  the 
relatively  high  false  positive  rates  of  the  repeated 
Pap  smear  implies  that  many  women  would  be  in- 
appropriately referred  to  colposcopy  [MF.  Mitchell, 
1994]. 

MediSpectra  has  developed  the  ColpoProbe  de- 
vice, an  optical  biopsy  instrument,  as  a  triage  tool 
for  following  an  ASCUS  Pap  Smear  result  in  order 
to  select  those  women  most  likely  to  require  col- 
poscopy. Increasing  the  sensitivity  of  the  procedure 
will  result  in  more  patients  receiving  the  required 
treatment,  while  increasing  the  specificity  will  re- 
sult in  fewer  unneeded  colposcopies.  In  order  to 
quantify  the  accuracy  of  the  ColpoProbe  device, 
MediSpectra  has  completed  a  feasibility  study  of 
the  instrument.  We  have  analyzed  the  data  col- 
lected by  the  ColpoProbe  to  assess  its  performance. 
The  present  paper  evaluates  the  diagnostic  power 
of  the  method  using  a  novel  machine  learning  strat- 
egy to  efficiently  process  the  large  amount  of  data 
produced.  It  also  emphasizes  methodological  ques- 
tions surrounding  the  application  of  these  machine 
learning  techniques. 

2     Data  description 

A  single  observation  from  the  original  data  set  com- 
prises 300  spectrograms  of  fluorescence  emission. 
These  emissions  were  less  or  equal  200  nsc.  of  each. 
Each  emission  spectrum  is  divided  into  185  spec- 
tral components.  Usually  the  fluorescence  emission 
spectra  have  high  noise.  Therefore,  it  is  necessary  to 
aggregate  the  observations  to  increase  reliability  in 
the  processing.  We  used  the  following  aggregation 
before  analysis: 

1.  The  average  spectrum,  i.e.  mean  spectrogram, 
was  computed  from  the  300  individual  observa- 
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tions; 

2.  The  average  spectrum  was  filtered  to  decrease 
the  noise; 

3.  The  filtered  average  spectrum  was  aggregated 
in  the  following  way:  the  original  185  com- 
ponents were  separated  in  a  small  number  of 
groups.  Each  group  was  represented  by  the 
mean  and  the  standard  deviation  of  original 
spectral  components  in  the  group. 

These  steps  of  "pre-analysis"  yield  an  efficient 
data  representation.  Thus,  after  the  preparation 
each  spectrum  was  represented  by  51  variables.  Fi- 
nally, we  got  the  set  of  the  vectors  in  51-dimensional 
space.  We  will  call  them  below  also  records  or 
cases.  The  set  consisted  of  81  cases  classified  by 
histophathologist  into  five  biopsy  result  categories, 
denoted  as  CC,  Metapl,  NED,  SILHI  and  SILLO 
with  the  distribution  indicated  in  the  table  1  below: 


Table  1:  BIOPSY 


CC 

Metapl 

NED 

SILHI 

SILLO 

Total 

7 

26 

19 

21 

8 

81 

In  this  paper  we  present  results  of  discrimination 
of  these  81  records  when  they  were  combined  into 
two  groups  of  biopsy  categories  namely,  "precan- 
cerous" (denoted  as  SIL  and  including  SILHI  and 
SILLO  categories)  and  "non-precancerous"  (non- 
SIL,  consisting  of  the  other  three  categories  of  CC, 
Metapl  and  NED).  Thus  there  are  29  SIL  and  52 
nonSIL  records. 

3  A  New  Machine  Learning  Algorithm 
based  on  the  Potential  Function  Method 
and  on  a  Nonlinear  Transformation  for 
Extracting  Discriminating  Features 

3.1     Overview  of  Method 

Most  machine  learning  methods  build  object- 
classification  functions  in  a  space  of  features.  In- 
put objects  are  represented  as  vectors  of  features 
and  can  be  interpreted  geometrically  as  points  in 
the  feature  space.  Thus,  any  surface  which  splits 
the  space  into  regions  associated  with  the  different 
classes  can  serve  as  an  object  classification  function. 


The  main  problem  with  machine  learning  is  to  de- 
rive or  learn  a  classification  surface  from  a  lim- 
ited number  of  cases  of  known  classification.  In 
other  words,  the  goal  of  learning  is  to  generalize  the 
knowledge  from  a  limited  set  of  samples.  In  geomet- 
rical terms,  the  goal  is  to  construct  a  discriminating 
surface  using  the  limited  set  of  labeled  points  in  the 
feature  space  so  that  it  will  be  able  to  identify  any 
new  set  of  input  vectors  (points)  of  unknown  classifi- 
cation. This  surface  is  called  a  discriminant  surface, 
and  the  mathematical  function  which  describes  it  in 
the  feature  space  is  called  a  recognition  function,  or 
a  recognizer  [Debroye  et  al.,  1996]. 
The  potential  function  method  is  an  universal  ma- 
chine learning  method  [0.  Bashkirov,  E.Braverman 
and  I.  Muchnik,  1964,  Aizerman  et  al.,  1974,  E. 
Braverman  and  Muchnik,  1983].  Recently  it  took  on 
renewed  importance  because  the  support  vector  ma- 
chine approach  [V.  Vapnik,  1995]  has  demonstrated 
great  efficiency  in  applying  integral  kernels  as  a  sys- 
tem of  functions  for  building  discriminants.  Learn- 
ing procedures  based  on  this  method  allow  practi- 
tioners to  incorporate  and  test  a  wide  variety  of  al- 
ternative sub-methods  that  can  be  adapted  or  cus- 
tomized to  specific  problems.  The  principal  idea  is 
to  identify  a  similarity  function  between  a  vector 
and  a  class  represented  by  a  collection  of  vectors 
of  known  classification  (known  as  the  training  set). 
Each  class  will  be  identified  by  its  particular  sim- 
ilarity function.  This  function  is  constructed  as  a 
linear  combination  of  integral  kernels  (the  potential 
functions),  each  of  which  is  associated  with  one  vec- 
tor from  the  training  set. 

A  discriminant  is  defined  as  the  maximum  of  the 
similarity  functions.  Thus,  for  each  vector,  the  val- 
ues of  the  similarity  functions  are  calculated  and  the 
largest  of  these  determines  the  class  associated  with 
the  vector.  To  learn  the  class  identification  is  to  de- 
termine the  coefficients  of  the  similarity  functions. 
As  in  all  machine  learning  methods,  the  potential 
function  method  has  free  parameters  that  must  be 
learned  from  the  training  data.  Also,  in  addition,  it 
allows  to  choose  an  integral  kernel  types  to  corre- 
spond to  the  implicit  structure  of  similarity  or  dis- 
similarity in  the  training  data.  The  disadvantage  of 
this  additional  freedom  is  the  fact  that  the  search 
for  the  best  kernel  and  its  parameters  can  be  quite 
expensive  and  time-consuming. 

Over  30  years  ago,  a  two-stage  recognizer  de- 
sign was  proposed  to  transform  original  variables 
into  "informative"  in  a  particular  sense  [Sebestyen, 
1962].   The  simple  rationale  here  was  that  the  fit 
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between  learning  procedure  and  training  samples 
is  expected  to  be  better  in  the  new  space  than  in 
that  of  the  original  one.  Mostly,  the  mentioned 
transformation  was  linear [W.  Weiss  and  C.A.  Ku- 
likowski,  1991;  S.  Watanabe,  1969].  Further  devel- 
opment of  this  approach  for  recogner  design  using  a 
non-linear  transformation  model  was  originally  sug- 
gested by  Fukunaga  [K.  Fukunaga,  1992].  However, 
this  method  does  not  lend  itself  to  wide  practice 
because,  for  example,  it  does  not  explore  combining 
potential  function  method  at  all. 

In  this  paper,  we  describe  and  test  a  new  non- 
linear procedure  for  transforming  a  space  of  origi- 
nal variables  into  one  that  is  more  discriminating  . 
The  result  of  applying  this  technique  is  a  set  of  new 
features  with  approximately  the  equal  degrees  of  in- 
formation content.  The  potential  function  method 
can  work  better  with  these  features  since  they  can 
be  then  combined  directly  without  the  need  for  sep- 
arate weights.  The  potential  functions,  then,  corre- 
spond to  simple  distance  functions. 

The  new  nonlinear  transformation  is  based  on 
three  simple  steps: 

1.  Extend  the  set  of  original  variables  by  adding  to 
them  non-linear  transformations  such  as  their 
products,  ratios,  etc. 

2.  Build  Boolean  variables  based  on  applying  cut- 
points  derived  from  the  training  data  to  yield 
the  best  predictive  performance  for  the  given 
classifications  based  on  a  statistical  similarity 
criterion.  Generally,  the  idea  of  using  Boolean 
space  for  a  recognizer  was  very  popular  in  the 
60's  [Bongard,  1968]  and  are  implemented  now 
when  the  machine  learning  area  is  integrated 
with  Al-approach  [P.  Hammer  et  all.,  1995]. 
Our  realization  of  the  idea  is  based  on  a  partic- 
ular systematical  approach  using  of  the  Exact 
Fisher  Test. 

3.  keep  a  part  of  generated  Boolean  variables 
which  are  not  correlated. 

The  result  of  the  first  two  steps  of  the  procedure 
is  a  set  of  Boolean  variables  instead  of  the  original 
numerical  ones.  A  third  step  suppresses  redundant 
information  from  these  derived  boolean  variables. 
It  keeps  as  a  final  set  only  those  informative  fea- 
tures which  are  correlated  no  higher  than  some  cho- 
sen threshold.   Because  the  procedure  produces  a 


Boolean  space  instead  of  a  numerical  one,  it  is  nat- 
ural to  use  the  potential  function  method  with  ker- 
nels based  on  a  Hamming  distance  function.  Next 
we  concentrate  on  the  specific  implementation  of  the 
described  combined  algorithm  for  our  problem.  One 
can  easily  change  the  algorithm  by  varying  the  de- 
scribed procedure. 

Note,  in  order  to  maximize  the  use  of  a  data  set  it 
is  common  to  derive  a  feature  space  transformation 
such  as  the  above  based  on  the  entire  data  set  (i.e. 
training  and  testing  sets  combined).  However,  we 
determine  the  weights  of  the  kernels  for  the  poten- 
tial function  algorithm  only  from  the  training  data 
set  alone. 

3.2    Principal  Steps  of  the  Learning  Proce- 
dure 

Below  we  give  a  formal  description  of  the  main  pro- 
cedures for  the  presented  method. 

Transformation  from  numerical  data  into 
boolean  data.  Three  procedures  are  described 
here  which  require  the  following  notation  and  def- 
initions. Let  us  denote  by  X  —  \\xij\\yi  — 
1,  2, N;  j  =  1,  2, n  any  given  matrix  of  N  rows 
{x,-.}  which  presents  objects  characterized  by  n  nu- 
merical variables. 

The  first  procedure  extends  X  to  Y  =<  X,  X'  > 
where  new  variables  from  X'  are,  as  mentioned 
above,  the  products  and  ratios  for  all  possible  pairs 
of  n  variables  from  X.  Every  pair  of  original  vari- 
ables [xk,  Xfc']  determines  three  new  numerical  vari- 
ables, -  (xk  *  Xk',  Xk/xk>,  Xk'/xk).  The  total  number 
variables  in  Y  becomes  (n  *  (n  —  1)  +  n(n  +  l)/2). 

The  second  procedure  analyzes  all  possible  cut- 
points  on  every  y- variable  (y  is  a  column  of  the  ma- 
trix Y;  a  cut-point  is  a  value  of  y  which  splits  all 
given  in  the  matrix  Y  y- values  into  2  groups:  which 
are  smaller  then  the  cut-point  and  which  are  higher 
it).  This  analysis  permits  us  to  evaluate  how  well 
the  associated  boolean  variable  correlates  with  the 
output-classification,  which  is  also  is  a  boolean  vari- 
able. We  will  denote  the  output  variable  as  C:  if 
for  an  i-th  record  the  corresponded  value  C,-  =  1  it 
means  the  i-th  record  belongs  into  the  class  A  and 
if  C,  =  0  it  belongs  into  the  class  B. 

One  determines  one  boolean  variable  z,k  by  con- 
ditions 

zsk  =  1  if  yk  >  y*k  and  z>k  =  0  if  yk  <  y*sk 
where  y*k  is  the  5-th  cut-point  on  the  corresponding 
numerical  variable  yk-   One  evaluates  the  variable 
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zsk  by  calculating  the  2*2  table 


a  b 
c  d 

where  a  =  #i(zi3k  =  lkd  =  l),b  =  #i(zis{k)  = 
ltd  =  0),c  =  #«(«,-,(*)  =  O&C-  =  l),d  = 
#<(*,-.(*)  =  O&C*  =  0). 

In  the  above  equalities  were  used  the  following 
notation: 

a)  a  number  of  observation  in  a  given  data  which 
matches  conditions  definded  by  the  content 

The  table  allows  to  use  a  statistical  measure  to 
estimate  the  probability  P  of  the  event  nZit(k)  = 
lkyi.  =  A  and  zis(k)  =  O&y;.  =  S".  After  the 
evaluation  one  chooses  a  threshold  P*  for  the  prob- 
ability P:  if  P  >  P*  (or,  1  -  P  >  P*)  then  zt(k) 
is  LepA  as  a  candidate  significant  boolean  variable. 
Mai"'r  different  statistics  can  be  used  for  this  pur- 
pose, of  which  the  simplest  is  to  directly  use  the 
ratio  p  =  (a  +  d)/(a  +  b  +  c  +  d)  :  if  for  exam- 
ple P*  =  .7  then  z,  is  significant  if  p  >  .7  or  if 
(1  —  p)  — >  .7.  In  our  concrete  research  we  used  so 
called  the  Exact  Fisher  Test  [Fleiss,1982;  Hays,1978; 
Woolson,1987].  The  quantities  a,fe,c,  and  d  are  the 
numbers  of  cases  in  the  2X2  confusion  matrix  where 
the  biopsy  classes  nonSIL  and  SIL  denote  the  true 
classification  and  VBOOLi  =  0  denotes  a  negative 
screening  result  from  using  the  i-th  Boolean  variable 
related  to  a  significant  variable,  while  VBOOLi  =  1 
would  correspond  to  a  positive  result  for  the  same 
variable. 


Fmin  =  m\nFishert. 


(2) 


BIOPSY 

VBOOLi 

nonSIL  SIL 

Total 

0 

a  b 

a+b 

1 

c  d 

c+d 

Total 

a+c  b+d 

a+b+c+d 

For  this  table  the  calculation  of  Fisher's  exact  (2- 
tailed)  test  was  carried  out.  The  probability  of  mea- 
suring a,b  c  and  d  in  the  table  under  the  assumption 
that  the  percentages  1-s  and  s  for  the  two  classes 
(nonSIL  and  SIL  )  are  the  same  is  then  given  by  the 
following  formula: 


Fishert  = 


(a  +  b)\(c  +  d)\(a  +  c)l(b  +  d)\ 
(a  +  b  +  c  +  d)\a\b\c\d\ 


(1) 


Variables  are  evaluated  and  selected  by  using  the 
values  of 


This  procedure  screens  all  n  yk-th  for  all  the  cor- 
responded cut-points  s(k)  and  collects  a  complete 
set  of  the  candidates  for  significant  boolean  vari- 
ables. We  denote  the  set  as  S  and  let  the  module  of 
S  be  M.  This  completes  the  second  procedure. 

Significant  variable  extraction.  The  final  step 
is  to  extract  a  set  S'  of  highly  significant  boolean 
variables  which  we  define  as  the  smallest  subset  of 
5  which  has  a  high  enough  level  of  correlation  with 
the  variables  from  the  complementary  subset  (5  — 
5').  One  can  apply  different  heuristic  procedures  to 
obtain  this  type  of  the  core  set.  We  used  one  of  the 
simplest: 

1)  building  a  matrix  of  correlations  on  the  defined 
set  of  all  M  candidates  for  the  significant  variables; 

2)  for  every  candidate  calculate  the  sum  of  the 
moduli  of  coefficients  of  its  correlations  with  all 
other  variables; 

3)  to  extract  a  set  S'  with  cardinality  correspond- 
ing to  a  prior  chosen  value  M'  with  the  largest  val- 
ues of  the  sums. 

Potential  function  procedure.  We  chose 
to  use  the  potential  function  method  in  its  non- 
parametric  form  modeled  by  a  parametric  family  of 
integral  kernels  [Aizerman  et  al.  1976]: 


K(u,v)  =  l/{a  +  brq(u,v)), 


(3) 


where  u  and  v  are  vectors  with  components  which 
are  extracted  significant  boolean  variables  (vectors 
in  the  M'-dimensional  space);  r(u,v)  is  a  distance 
function  defined  on  an  arbitrary  pair  of  vectors  and 
{a,  6,  q}  are  three  scalar  free  parameters.  Because 
K  is  a  monotonically  decreasing  function  of  the  dis- 
tance function  we  can  interpret  it  as  a  similarity 
measure.  The  method  introduces  a  similarity  func- 
tion K (u,  G)  with  values  that  are  average  similar- 
ities between  a  vector  u  and  a  class  of  vectors  G. 
The  function  is  the  mean  of  the  above  described  el- 
ementary similarity  measure  K (u,  v): 


K(u,  G)  = 


W\ 


^2  c{v)K{u,v), 


(4) 


where  |  •  | is  the  notation  for  a  modulus,  and  c(v)  is  a 
function  of  weights  defined  on  all  elements  of  G.  Let 
us  assume  that  the  training  set  for  class  A  contains 
iV"i  vectors  and  the  training  set  for  class  contains 
iV2  vectors  then  the  discriminant  rule  is  given  by 
the  function: 
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F(u)  =  K(u,  A)  -  K(u,  B).  (5) 

The  rule  is  simple:  if  F(u)  >  0  then  u  belongs  in 
class  A  and  vice  versa.  The  function  of  weights  c(v) 
is  defined  on  the  training  set  by  a  standard  algo- 
rithm of  the  potential  function  method:  the  train- 
ing set  is  observed  iteratively,  -  an  observed  element 
u  is  recognized  by  the  above  rule  based  on  current 
values  of  the  function  c(v),  and  if  it  is  recognized  in- 
correctly the  corresponding  weight  c(u)  is  increased 
to  [c(u)  +  1];  if  it  is  recognized  correctly  no  weights 
change.  The  procedure  is  stopped  when  a  cycle  of 
observations  of  all  elements  in  the  training  set  don't 
yield  changes  in  the  value  of  the  function  c(v). 

4     Procedures  for  performance  evaluation 

Two  types  of  tests  have  been  used  for  the  perfor- 
mance evaluation.   The  first  is  a  computationally 
inexpensive  single  test  procedure,  while  the  second 
is  a  form  of  cross-validation: 
A)  Single  Test  Procedure: 

1.  Randomly  divide  each  class  (nonSIL  and  SIL) 
into  approximately  equal  subsets 

2.  Perform  the  procedure  described  in  section  3.5 
twice.  First  let  SUBSET  1  be  the  learning 
sample,  and  SUBSET  2  be  the  testing  sam- 
ple. Then  reverse  this,  using  SUBSET  2  as  the 
learning  sample,  and  SUBSET  1  as  the  testing 
sample. 

Recognition  results  for  both  applications  of  the 
method  are  then  combined  into  the  following  stan- 
dard performance  table  (or  confusion  matrix): 


DISCR.  DECISION 

CLASS  BY  BIOPSY 

nonSIL(N) 
SIL(P) 

nonSIL(N)  SIL(P) 
Tn  Fp 
Fn  Tp 

where:  Tp  -  True  Positives:the  number  of  SIL 
cases  correctly  recognized  as  SIL; 

Fp  -  False  Positives:the  number  of  SIL  cases  in- 
correctly recognized  as  nonSIL; 

Tn  -  True  Negatives:  the  number  of  nonSIL  cor- 
rectly recognized  as  nonSIL 

Fn  -  False  Negatives:  the  number  of  nonSIL  in- 
correctly recognized  as  SIL 


Their  percentages  with  respect  to  the  correct  cat- 
egorization (based  on  biopsy  results)  can  then  be 
used  to  evaluate  the  performance  of  the  discrimi- 
nant method  in  terms  of  true  positive  and  true  neg- 
ative frequencies  (Tp%,Tn%  respectively),  and  their 
percentages  with  respect  to  the  classifier  decision 
provides  the  predictive  power  of  a  positive  result 
vs.  that  of  a  negative  result  (Pp%  and  Pjv%)  respec- 
tively). 

B)  Multiple  test  procedure: 

The  test  described  in  A)  above  is  performed  100 
times  -  for  100  random  splits  of  the  data  set  into  two 
parts.  Summary  statistics  for  frequencies  of  true 
positives,  false  positives,  true  negatives  and  false 
negatives  will  provide  the  appropriate  performance 
results  for  the  discriminant  method.  This  is  a  type 
of  cross-validation. 

5     Results  of  ColpoProbe  Data  Analysis 

All  the  results  described  below  were  obtained  by  ap- 
plying the  potential  function  method  on  significant 
boolean  variables  derived  from  the  entire  set  of  81 
observations.  While  from  an  application  perspective 
statistically  it  would  have  been  more  appropriate  to 
have  derived  these  Boolean  variables  from  the  learn- 
ing set  data  alone,  the  present  analysis  suits  the 
purpose  of  providing  an  evaluation  of  the  potential 
function  learning  method.  The  application  of  the 
cut-point  procedure  produces  20  significant  boolean 
variables  generate  from  the  set  of  81  observations. 

For  the  selected  variables  we  performed  100  tests 
of  the  discriminant  function  using  100  random 
splits,  yielding  the  performance  Table  2(in  the  first, 
fourth  and  seventh  line  of  the  table  are  numbers  of 
observations  were  tested,  -  because  every  original 
observation  was  tested  100  times  the  total  number 
of  tests  was  8,100;  respectively,  second  and  fifth  line 
of  the  table  present  results  in  rows;  finally,  the  third 
and  sixth  line  present  for  columns  of  the  table). 


Statistics  for  percents  is  the  table  3. 


The  best  performance  is  given  by  a  split  with  the 
results  shown  in  the  table  4. 
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Table  2:  100  Tests  result 
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Abstract.  This  paper  proposes  a  cycled  system  for 
learning  of  the  hierarchical  models  of  inheritance  of 
Multiagent  Systems  (MAS),  using  a  logical  frame- 
work. The  system  acquires  the  appropriate  models 
from  the  descriptions  of  the  most  informative  models 
with  respect  to  the  learning  process.  The  determina- 
tion of  the  most  informative  models  is  accomplished 
by  analysing  hypothetical  space  of  the  models  and 
evaluating  the  problem  solving  capabilities  of  the 
MAS,  based  on  these  models.  The  system  presents 
two  advantages:  its  components  are  symbolic,  and  it 
requires  significantly  small  number  of  the  "training" 
models.  This  allows  learning  of  "understandable" 
symbolic  descriptions  of  the  hierarchical  models  of 
inheritance  in  reasonable  time. 

Keywords:  Machine  Learning,  Multiagent  Systems, 
Logic  Programming. 

1  Introduction 

Learning  in  MAS  is  on  the  intersection  between  the 
multiagent  paradigm  and  the  machine  learning  one  [3, 
8].  It  is  motivated  by  the  insight  that  it  is  not  possible 
to  determine  a-priori  the  complete  knowledge  that 
must  exist  within  each  component  of  a  distributed 
system,  in  order  to  allow  for  a  satisfactory  perform- 
ance of  such  a  system.  It  is  therefore  broadly  agreed  in 
both  the  Distributed  Artificial  Intelligence  and  Ma- 
chine Learning  communities,  that  there  is  need  to  pro- 
vide these  systems  with  the  ability  to  learn,  i.e.  to  self- 
improve  their  future  behaviour. 

Learning  in  MAS  consists  of  two  types  of  learning. 
First,  single-agent  learning;  i.e.  learning  that  is  done 
by  a  single  agent  as  a  result  of  its  interaction  with  the 
environment  or  a  set  of  other  agents.  This  type  of 
learning  presupposes  the  application  of  the  traditional 
machine  learning  techniques.  Second,  distributed 
learning,  that  is  defined  as  learning  what  is  possible, 
only  because  several  agents  are  presented.  This  type 
of  learning  comprises  learning  in  teams,  learning  to 
act  in  teams,  learning  to  coordinate  agents,  learning  of 
the  organisational  structure  of  MAS,  etc.  The  main 
techniques  used  in  distributed  learning  are  borrowed 
from  Reinforcement  Learning  and  Neural  Networks 
[6,8],  which  ensure  perceptiveness,  correctness  and 
reactiveness  of  the  MAS,  but  at  the  same  time  retain 
the  main  shortcomings  of  the  non-symbolic  ap- 
proaches to  Artificial  Intelligence,  namely: 
•  incomprehensibility  of  the  knowledge  and  behav- 
iour of  the  systems; 


•  high  time  complexity  as  a  result  of  exploring  a 
sufficient  number  of  environmental  states. 
The  research  considered  in  this  paper  aims  at 
overcoming  these  problems  in  the  learning  organisa- 
tional structure  of  MAS,  endorsed  via  the  disjunctive 
logic  programming  paradigm  facing  inheritance  [1]. 
That  is  why  the  learning  organisational  structure  is 
reduced  to  the  task  of  learning  of  the  hierarchical 
models  of  inheritance.  Solving  this  task  is  proposed  to 
be  done  in  a  closed  cycle: 

WHILE  there  is  no  acceptable  description  of  the 

models  of  inheritance  DO 

Determine  the  satisfactory  performance  of  the  MAS 
with  respect  to  their  current  models  of  inheritance; 
Consider  the  descriptions  of  the  models,  as  their 
examples  with  positive  or  negative  classification  in 
respect  to  the  performance; 

Learn  the  general  descriptions  of  the  "good"  mod- 
els with  the  new  examples; 

Determine  the  most  informative  (examples)  models 
with  respect  to  learning; 

Change  the  model  of  the  systems  with  the  most  in- 
formative ones. 
END 

The  steps  1-2  and  4-5  are  completely  symbolic 
sub-processes,  that  require  explicit  descriptions  of  the 
inheritance  models.  That  is  why  the  learning  step 
obeys  this  requirement,  and  the  communication  within 
the  cycle  is  accomplished  on  a  symbolic  level.  This 
means  that  the  intermediate  results  of  the  learning 
process  are  symbolic  descriptions  of  the  models  of 
inheritance.  Their  complete  acquisition  is  accom- 
plished when  the  determination  of  the  most  informa- 
tive models  is  not  possible,  or  when  the  designers  of 
the  MAS  agree  on  some  of  the  acquired  descriptions 
of  the  "good"  models,  as  satisfactory.  Therefore,  the 
process  of  learning  is  completely  autonomous,  and 
requires  only  the  most  informative  models,  in  contrast 
with  existing  algorithms  [8].  The  final  results  are  the 
symbolic  descriptions  of  the  models  of  inheritance, 
that  guarantee  a  satisfactory  performance  and  under- 
standable behaviour  of  the  corresponding  MAS. 

This  paper  sketches  the  logical  knowledge  repre- 
sentation scheme  in  section  2.  Section  3  considers  a 
system  for  learning  of  the  models  of  inheritance, 
based  on  the  proposed  cycle.  Section  4  analyses  the 
system  and  proposes  future  directions  for  research. 
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2  Knowledge  Representation  with  Inheritance 

This  work  considers  knowledge  representation  by 
means  of  a  hierarchical  model,  with  inheritance,  that 
endorse  the  treatment  of  incomplete  information,  al- 
lowing for  commonsense  reasoning. 

The  system  structure  is  supported  by  two  kinds  of 
entities:  agents,  that  represent  specific  entities  such  as 
objects  or  their  individual  classes  or  instances;  and 
links,  which  are  used  as  a  way  of  formalising  relation- 
ships among  those  entities  (Figure  1).  One  link  may 
represent  a  hierarchical  relation  between  agents,  or  the 
association  of  a  set  of  data  to  an  agent  (the  agent's 
theory). 


Figure  1.  A  Hierarchical  Model  of  Inheritance 


In  terms  of  a  logic  program,  this  kind  of  relation 
between  two  agents  can  be  described  by  a  predicate  of 
the  type  ISA  links,  that  are  found  in  most  of  the  works 
about  hierarchical  structures,  namely: 
isa:  Agent,  Class,  Cancelled 

where  the  first  argument  labels  the  agent  on  a  hierar- 
chical structure,  linked  to  the  class  given  by  the  sec- 
ond parameter.  The  last  argument  presents  the  set  of 
information  that  must  be  cancelled  in  the  inheritance 
process. 

One  may  say  that  Tweety  is  a  bird,  where  the 
knowledge  about  whether  it  may  or  may  not  fly,  is  not 
inherited,  but  represented  in  terms  of  the  production: 
isa( tweety,  bird,  [fly]) 

About  the  agents  knowledge,  a  predicate  is  de- 
fined, being  the  first  argument  the  identification  of  the 
agent,  and  the  second  its  theory: 
agent:  Agent,  Theory 

The  fact  that  birds  can  fly,  that  have  wings,  but  do 
not  have  wheels,  can  be  declared  as: 

agent( bird, [fly,  wings,  — (wheels]) 

Note  that  the  classical  negation,  '-.',  is  being  used 
in  a  way  that  permits  one  to  denote  what  kind  of 
knowledge  must  be  interpreted  as  known  to  be  false, 
distinguishing  from  other  situations  where  one  does 
not  have  information  about  the  truth  of  knowledge. 

In  this  system  the  agent's  data  is  represented  at  the 
level  that  it  appears  on  the  hierarchy.  For  those  agents 
which  are  subclasses  of  other  entities,  its  knowledge 
may  be  inherited  from  the  hierarchy,  where  the  proof 
mechanism  to  provide  the  system  with  this  kind  of 
flavour  may  be  stated  as  follows: 
•  a  question  may  be  proved  at  the  agent's  level;  or 


•  the  solution  to  a  question  may  be  found  at  a  level 
on  the  hierarchy  that  is  meta  to  the  agent  to  which 
the  question  was  initially  for. 

This  type  of  reasoning  will  be  performed  by  a  proof 
predicate,  defined  as: 
proof:  Agent,  Question 

where  Agent  identifies  the  agent  questioned  and 
Question  represents  the  question  itself. 

The  first  bullet  above  can  be  declared,  as: 
proof(Agent,  Question)  <— 

agent(Agent,  Theory) , 

process( Question,  Theory) 

where  "  names  "if  and  the  proof  for  a  question 
stated  for  a  specific  agent  is  found  if  it  is  possible  to 
process  that  question  in  the  agent's  theory. 

The  second  bullet,  states  that  the  solution  to  one's 
question  must  be  searched  up  to  the  hierarchy,  through 
those  branches  with  permission  to  inherit  knowledge: 

prooflAgent,  Question)  <— 
isa( Agent,  Class,  Cancel) , 
not  exists( Question,  Cancel)  , 
proof{ Class,  Question ) 

3  Learning  Hierarchical  Models  of  Inheritance 

Learning  hierarchical  models  of  inheritance,  based  on 
the  proposed  knowledge  representation  scheme,  are 
needed,  because  it  is  not  possible  to  determine  a-priori 
the  models,  such  that  a  satisfactory  performance  of  the 
systems  is  guaranteed.  This  presupposes  that  the  hier- 
archical models  of  the  systems  have  to  be  refined 
(learned),  using  the  feedback  from  their  decision 
making  processes.  The  definition  of  the  learning  task, 
and  proposition  of  a  system  for  its  solving,  are  the 
subjects  of  this  section. 

3.1  The  Definition  of  the  Task 

A  system  based  on  the  knowledge  representation 
scheme  from  section  2,  can  be  considered  as  a  MAS. 
The  system  consists  of  a  set  of  agents  and  classes, 
whose  relationships  are  determined  by  a  given  hierar- 
chical model  of  inheritance.  The  task  for  learning  of 
the  models  is  a  task  of  acquiring  new  models,  that 
ensures  the  system  effectiveness  with  respect  to  a 
given  criterion  of  performance. 

The  Task  of  Learning  the  Hierarchical  Models  of 
Inheritance 

Given: 

•  A  multiagent  system: 

•  A  set  of  agents  and  agent's  classes; 

•  A  hierarchical  model  of  inheritance. 

•  A  criterion  of  performance  of  the  system; 

•  A  set  of  the  tasks  of  the  system. 
Find 

•  The  hierarchical  models  of  inheritance  of  the  sys- 
tem that  maximise  its  performance  criterion. 

The  task  of  learning  the  models  of  inheritance  is  a 
task  of  learning  the  relationships  among  the  agents 
and  their  classes.  It  is  motivated  by  the  fact  that  the 
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The  Hierarchical  Model  of  Inheritance 


The  correspondence  between  the  ISA 
predicates  and  the  attributes  in  DNF  language 

isa(ag2,  ag,,[Y])<->A,=  0 
isa(ag2,  ag,,[])  <->At=  1 
isa(ag3,ag,,[Y])<->A2=0 
isa(ag3,  agj,[])  <->A2=  1 
isa(ag3,  ag2,[Y])  <->A3=0 
isa(ag3,  ag2,[])  <->A3=  1 
isa(ag<,  ag2,[V])  <->A4=0 
isa(ag4,  ag2,[])  *->A4  =  I 


Figure  2.  An  Example  of  a  Multiagent  Logic  Programming  System,  its  Hierarchical  Model  of  In- 
heritance, and  Corresponding  Description  Languages 


model  of  inheritance  is  formed  by  such  relationships, 
and  therefore  is  described  by  a  conjunction  of  the  cor- 
responding ISA  predicates.  As  different  hierarchical 
models  can  have  "positive  or  negative  classification" 
with  respect  to  the  performance  criterion  of  the  sys- 
tem, their  "ISA"  descriptions  can  be  considered  as 
training  examples  of  these  models.  Forming  the  ex- 
amples is  a  gradual  process  of  transition,  of  a  system's 
model  to  another,  by  means  of  an  association  of  the 
models  with  the  classification  determined  by  the  crite- 
rion of  performance.  This  means  that  the  task  of 
learning  models  of  inheritance  is  an  incremental  one. 

The  result  of  the  learning  process  has  to  be  a  set  of 
the  most  important  models  with  respect  to  the  high 
quality  performance  of  the  system.  The  set  is  defined 
as  a  set  of  models,  such  that  does  not  exist  any  other 
set  with  smaller  number  of  models,  which  lead  to  the 
satisfactory  performance  of  the  system. 

3.2  The  Choice  of  the  Description  Language  of  the 
Hierarchical  Models  of  Inheritance 

The  description  of  models  of  inheritance  has  to  be 
made  within  languages  that  ensure  solvability  of  the 
incremental  learning  task  in  real  time.  Therefore  the 
DNF  attributive  languages  are  chosen,  because  the 
corresponding  learning  algorithms  are  very  simple, 
and  lead  to  suitable  average  complexity  for  both  batch 
and  incremental  learning. 

Encoding  the  descriptions  of  models  of  inheritance 
in  DNF  attributive  languages  is  possible  by  the  fol- 
lowing scheme:  every  ISA  link  corresponds  to  an  at- 
tribute, and  every  cancelled  list  corresponds  to  an  in- 
teger (Figure  2).  The  integer  number  represents  a 
vector  with  N  binary  digits,  where  N  is  the  number  of 
the  possible  predicates,  whose  extensions  can  be  in- 
herited. Therefore  a  position  in  the  vector  corresponds 
to  a  predicate,  and  the  value  1  (0)  of  the  digit  means, 
that  the  extension  of  the  predicate  can(not)  be  inher- 
ited. The  integer  belongs  to  the  closed  interval  [0,2N] 
which  boundaries  mean  forbidden  and  full  inheritance. 

This  scheme  of  representation  aims  at  decreasing 
the  description  complexity  of  the  learning  task,  and  at 
the  same  time  preserving  the  exactness  of  the  model 


descriptions.  In  this  way  the  task  can  be  solved  in  rea- 
sonable time  by  the  incremental  decision  tree  learning. 

3.3  Incremental  Decision  Tree  Learning 

The  requirements:  i)  the  task  of  learning  the  models  of 
inheritance  is  incremental;  ii)  the  learning  has  to  iden- 
tify the  important  models;  and  Hi)  the  representative 
scheme  of  the  models  is  attributive,  determine  ITI 
algorithm  as  the  most  suitable  one  [7].  ITI  is  an  algo- 
rithm for  incremental  concept  learning  by  induction 
on  decision  trees,  that  handles  both  numeric  and  sym- 
bolic values  of  attributes  of  the  examples  in  DNF  at- 
tributive languages.  Algorithm  understanding  presup- 
poses the  consideration  of  the  notion  of  the  decision 
tree  and  the  method  of  learning. 

3.3.1  Decision  Trees 

A  decision  tree  with  respect  to  a  set  of  pre-classified 
examples,  that  are  presented  in  a  conjunctive  attribu- 
tive language,  is  a  tree  whose  nodes  correspond  to 
different  attributes  of  the  language,  and  the  branches 
(arcs)  correspond  to  different  values  of  the  attributes, 
so  that  every  example  from  the  set  can  be  coded  by 
simple  traversing  from  the  root  of  the  tree  to  the  cor- 
responding example's  leaf  (Figure  3).  The  use  of  the 
decision  tree  in  induction  is  due  to  the  natural  fact, 
that  the  process  of  learning  consists  of  finding  the 
nodes  within  the  tree,  corresponding  to  examples  with 
the  same  classification.  The  descriptions  of  the  paths 
from  the  root  of  the  tree  to  these  nodes  are  represented 
in  the  conjunctive  attributive  language.  Each  of  these 
descriptions  is  discriminant;  i.e.  it  contains  only  those 
attributes  which  values  discriminate  examples  from 
the  corresponding  node  with  respect  to  other  exam- 
ples. Therefore  the  disjunction  of  the  conjunctive  de- 
scriptions with  the  same  classification  form  the  dis- 
junctive discriminant  description  (in  DNF  attributive 
languages)  of  a  concept,  that  is  associated  with  the 
classification.  This  is  a  natural  result  in  decision  tree 
learning,  that  can  be  re-interpreted  as  acquiring  the 
maximal  descriptions  of  the  concepts  in  the  partially 
ordered  DNF  attributive  languages. 

The  main  problem  in  decision  tree  learning  is  the 
fact  that  the  number  of  possible  decision  trees  is  equal 
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A  pre-classified  set  of  examples 
of  the  model  of  inheritance 


The  Corresponding  Decision  Tree 


Training  Data 


Classification 


Ai  A2  A3  A4  of  the  examples 
0    0     0  1 

0    0      11  + 
0     10  1 
0  111 

1111  + 


-0101 
-0111 


The  resulted  description  of  the  "good"  hierarchical  models  of  inheritance  derived  by  traversing  the  paths  in  the  tree: 
( (A2=  0aA3=1)  v(A,=  1  a  A2=  !))<-> 

<->((isa(ag3,  ag,,[Y])  a  isa(ag},  ag2,U) )  W  isa(ag2,  agi.O)  a  isa(ag).  ag,,[]) )) 


Figure  3.  A  Decision  Tree  from  the  Training  Examples  of  the  Inheritance  Model 


to  the  possible  combinations  of  the  attributes  in  the 
languages.  That  is  why  the  problem  of  finding  the 
most  minimal  decision  tree  is  very  important,  and  at 
the  same  time  unsolvable  in  the  general  case.  In  spite 
of  this  negative  fact  there  exists  a  general  decision  tree 
algorithm  [4],  that  builds  decision  trees  using  a  top- 
down,  divide-and-conquer  approach:  select  an  attrib- 
ute; divide  the  example  set  into  subsets,  characterised 
by  the  possible  values  of  the  attribute;  and  follows  the 
same  procedure  recursively  with  each  subset  until  no 
subset  contains  examples  with  different  classification. 
The  algorithm  is  able  to  construct  near-optimal  deci- 
sion trees  when  the  choice  of  the  attribute  on  every 
step  is  made  by  the  use  of  the  following  information 
theoretic  method:  suppose  that  the  training  example 
set  contains  p  positive  and  n  negative  examples;  at  the 
same  time  any  attribute  A  divides  the  training  set  E 
into  the  subsets  El....Er,  according  to  its  r  distinct 
values;  each  subset  Ei  has  pi  positive  and  ni  negative 
examples;  so  the  information  gain  from  the  test  of  the 
attribute  A  is: 

r 

Gain(A)  =  I(-^,  -£-)-  XtHt  ^T^T'  7^ 

1=1 

where  /  is  the  informative  formula. 

In  this  way  on  every  step  the  algorithm  calculates 
the  information  gain  of  the  attributes,  and  chooses  the 
one  that  has  the  best  gain.  This  guarantees  the  build- 
ing of  near-optimal  decision  trees  in  batch  manner, 
whose  validity  has  been  shown  in  [4]. 

3.3.2  Incremental  Algorithm 

ITI  algorithm  builds  incrementally  Decision  Trees 
(DT)  in  contrast  with  the  general  decision  tree  algo- 
rithm (Figure  4).  It  uses  the  following  scheme:  the 
initial  tree  is  the  empty  tree  (NIL).  When  an  example 
is  to  be  incorporated  into  a  tree,  if  the  tree  is  NIL  then 
the  tree  is  replaced  by  a  leaf  that  indicates  the  class  of 
the  leaf,  and  the  example  is  attached  to  the  leaf. 
Whenever  an  example  is  to  be  incorporated,  the 
branches  of  the  tree  are  followed  according  to  the  val- 
ues in  the  example  until  a  leaf  is  reached.  If  the  exam- 
ple has  the  same  classification  as  the  leaf,  the  example 
is  simply  added  to  the  set  of  examples,  saved  at  the 
node.  If  the  example  has  a  different  classification  from 


the  leaf,  the  algorithm  attempts  to  turn  the  leaf  into  a 
decision  node  by  the  choice  of  the  proper  attribute. 
Immediately  after  an  example  incorporation,  the  tree 
is  balanced  according  to  the  gain's  expression. 


ITI(D7V  decision  tree,  E:  training  example  set) 
FOR  every  example  e  e  E  DO 
IF  DT  is  NIL 
THEN 

Replace  DT  by  leaf  node  L  with  classification 
equal  to  the  classification  ofe; 
ELSE 

Classify  e  in  DT  and  find  a  leafL  in  DT, 

that  corresponds  to  e; 

IF  e  has  the  same  classification  as  the 

corresponded  leafL 

THEN 

Add  e  to  the  example  set  ofL; 
ELSE 

Convert  the  leaf  L  into  a  decision  node 
DN  and  determine  the  partition  attribute 
of  DN; 

Extend  DT  with  a  sub-tree  ofDN; 
Balance  DT. 

END. 


Figure  4.  ITI 's  Algorithm 

3.3.3  The  Most  Informative  Learning 

The  incremental  learning  can  be  speed  up,  if  it  is  car- 
ried out  by  the  most  informative  training  examples, 
which  are  defined  as  examples  that  correspond  to  a 
half  of  the  descriptions  in  the  hypothesised  space  of 
the  concepts  to  be  learned.  In  this  way,  regardless  of 
their  classification,  the  learning  is  dramatically  speed 
up,  because  a  half  of  the  currently  plausible  concept 
descriptions  is  rejected. 

Applying  this  idea  to  decision  tree  learning  pre- 
supposes the  determination  of  the  DNF  hypothetical 
spaces  of  the  concepts,  presented  in  decision  trees 
(Figure  5).  These  hypothetical  spaces  are  determined 
by  two  boundaries  in  the  partial  ordered  structure  of 
DNF  attributive  languages  [2].  The  first,  upper 
boundaries,  are  disjunctive  discriminant  descriptions 
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The  structure  of  DNF  hypothetical  space 


Ai=1a  A2=l  a  A3=?a  A*=? 


Ai=?a  A2=Oa  A3=l  a  A<=? 


Ai=0a  A2=0  a  A3=1a  A4=?     A,=?a  A2=0a  A3=1aA4=1   Ai=1  a  A2=1a  A3=1a  A4=?  Ai=1a  A2=l  a  A3=?a  A«=l 


A,=0a  A2^0"a'A3=1a  A4=l 


Ai=1a  A2=1a  A3=l  a  A4=l 


The  upper  boundary: 

(A2=  0aA3  =  l)v(Ai  =  l  aa2  = 


The  lower  boundary: 

(A,=  0  aA2  =  0  aA3=  1  aA4=  1)  v(A/=  1  aA2=  1  aAj=  1  aA<=  1) 


Both  conjunctive  hypothetical  spaces  in  DNF  space  are  shown  with   and  lines. 

The  upper  and  lower  boundaries  of  the  first  conjunctive  hypothetical  space  are, 
respectively:     (A2=  0  aAj= /)     and  (A/=  0  aA2=  0  aAj=  1  aA<=  I) 

The  upper  and  lower  boundaries  of  the  second  conjunctive  hypothetical  space  are, 
respectively:     (Ai=  1  aA2  =  1)     and      (Ai=  1  aA2=  /  aAj=  1  aA<=  1) 


Figure  5.  The  Hypothetical  Spaces  of  the  "Good"  Models  of  Inheritance 


of  the  concepts,  that  according  to  the  properties  of 
decision  trees  are  the  maximal  descriptions  of  the 
concepts  in  the  DNF  attributive  languages.  The  sec- 
ond, lower  boundaries,  are  disjunctions  of  examples 
of  the  concepts,  which  ensure  their  minimality  in  these 
languages  [5]. 

The  most  informative  examples  with  respect  to  the 

first  conjunctive  hypothetical  space: 

(A,=  1  a  A2=  0  aA3=  1  aA4=  I)  and 

(A/=  0aA2=0aA3=  I  a  A<=  0); 

These  examples  are  near  misses  examples  with  respect 

to  (At=  0  a  A2=  0  a  A3=  1a  A4=  1)  and  correspond  to 

(A2=0aA3=1). 

Figure  6.  Determining  the  Most  Informative  Examples 
with  respect  to  the  First  Conjunctive  Hypothetical 
Space  (presented  in  Figure  5). 

The  DNF  hypothetical  spaces  can  be  considered  as 
unions  of  conjunctive  hypothetical  sub-spaces  (Figure 
5).  These  sub-spaces  are  determined  by  the  conjunc- 
tive discriminant  descriptions  of  the  concepts  (upper 
boundaries)  and  corresponding  concepts  examples 
(lower  boundaries),  which  are  mutually  disjointed 
with  respect  to  examples,  once  their  upper  boundaries 
are  discriminant.  Therefore  it  is  not  possible  to  deter- 
mine the  most  informative  examples  for  DNF  hypo- 
thetical spaces.  However  that  is  possible  for  the  con- 
junctive hypothetical  sub-spaces,  once  they  are  con- 
vex and  definite  (for  the  chosen  coding  scheme),  and 
there  exists  a  simple  near-optimal  heuristic  procedure 
for  determining  the  most  informative  examples  with 
respect  to  them.  It  determines  the  most  informative 
examples,  as  near-misses  examples;  i.e.  the  examples 
whose  descriptions  are  different  with  respect  to  the 
descriptions  from  the  lower  boundaries  only  with  the 
value  of  one  attribute  (Figure  6)  [2].  The  simplicity  of 
the  procedure  makes  possible  the  development  of  ora- 


cle for  support  of  most  informative  incremental 
learning  of  decision  trees  (Figure  7).  The  oracle  gets, 
as  an  input  a  decision  tree  and  acquires  the  most  in- 
formative training  examples  with  respect  to  the  tree 
and  the  concepts  to  be  learned.  In  order  to  do  that,  the 
oracle  determines  conjunctive  hypothetical  spaces  of 
the  concepts  by  acquiring  their  boundaries  from  the 
decision  tree.  The  lower  boundaries  serve  to  deter- 
mine the  most  informative  examples  by  the  use  of  the 
heuristic  procedure,  and  the  upper  boundaries  for 
sifting  out  those  of  them,  that  do  not  correspond  to  the 
spaces. 

Oracle(Z)r :  decision  tree;  {  Eq  }■'  set  from  sets  of 
the  most  informative  examples  of  the  concepts) 
Determine  the  upper  (U)  and  lower  (L)  boundary 
sets  of  the  conjunctive  hypothetical  spaces  of  the 
concepts  C  encoded  in  DT; 
FOR  every  concept  C  DO 
FOR  every  boundary  set  of  concept  C  DO 
Determine  the  set  of  near-misses  examples  E'c 
with  respect  to  L; 

Remove  from  E'c  those  examples  that  do  not 
correspond  to  U; 
Eq  —  Eq  kjE  c 

END 

Figure  7.  The  Oracle  for  Determining  The  Most  In- 
formative Examples 

3.4  A  System  for  Learning  of  the  Hierarchical 
Model  of  Inheritance 

The  system  for  learning  of  the  hierarchical  models  of 
inheritance  consists  of  a  set  of  embedded  (six)  sub- 
systems, that  form  a  closed  cycle  (Figure  8).  The  first 
one  is  a  MAS,  which  models  of  inheritance  have  to  be 
determined.  This  system  tries  to  solve  a  particular  task 
from  a  given  task  set,  as  its  initial  model  of  inheri- 
tance is  preliminarily  determined.  The  results  of  the 
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problem  solving  process  are  analysed  by  the  second 
sub-system,  that  identifies  their  quality  by  a  given 
criterion  of  performance  of  the  MAS.  The  logic  de- 
scription of  the  current  hierarchical  model  is  trans- 
lated to  DNF  attributive  language  by  the  third  sub- 
system, that  uses  the  coding  scheme,  proposed  in  the 
sub-section  3.2.  The  resulted  attributive  description  is 
considered,  as  a  training  example  with  positive  or 
negative  classification,  depending  on  the  quality  of  the 
current  solution.  The  fourth  sub-system  ITI  algorithm 
updates  the  decision  tree  of  the  concept  corresponding 
to  the  models  of  inheritance  with  the  new  example. 
The  new  decision  tree  is  used  by  the  fifth  sub-system 
of  oracle,  that  determines  an  ordered  set  of  the  most 
new  informative  examples.  This  set  of  examples  is 
translated  by  the  sixth  sub-system  from  DNF  attribu- 
tive language  to  the  initial  logic  language.  The  first 
example  of  the  set  represents  the  description  of  the 
new  most  suitable  model  of  inheritance  with  respect  to 
the  learning  process.  That  is  why  the  MAS  changes  its 
model  of  inheritance  according  to  the  new  description 
and  continues  to  solve  its  tasks.  This  closed  process  of 
learning  of  the  hierarchical  models  of  inheritance  goes 
on  till  the  moment  when  the  oracle  is  not  able  to  gen- 
erate new  examples,  or  the  designers  of  the  MAS  de- 
termine some  of  the  models,  as  satisfactory.  In  both 
cases  the  final  result  is  a  symbolic  description  of  the 
"good"  inheritance  models  of  the  MAS. 

4  Conclusion 

This  article  proposes  a  system  for  learning  the  hierar- 
chical models  of  inheritance  of  MAS,  that  are  repre- 
sented under  the  logical  knowledge  representation 
scheme,  presented  in  section  2.  The  main  advantages 
of  the  system  are  the  following: 

•  The  system  acquires  the  symbolic  descriptions  of 
the  most  important  symbolic  models  of  inheri- 
tance, that  ensures  satisfactory  and  understandable 
performance  of  the  corresponding  MAS. 

•  The  system  learns  the  models  of  inheritance  in 
reasonable  time,  because  the  most  informative  ex- 
amples of  the  models  are  only  used,  that  leads  to 
rejection  of  a  half  of  the  hypothetical  spaces  of 
models  in  the  average  case.  This  property  contrasts 
the  proposed  system  with  the  existing  system  for 
refinement  of  the  agent  relationships,  that  use  ar- 
bitrary examples  [6,8]. 

•  The  system  is  completely  autonomous;  i.e.  it  does 
not  require  external  help. 

•  The  system  can  be  applied  to  different  MAS  with 
inheritance  models. 

The  main  shortcomings  of  the  system  pointed  the 
directions  for  future  research: 

•  Impossibility  to  learn,  when  the  criterion  of  per- 
formance of  the  MAS  is  not  well-defined,  that 
leads  to  the  presence  of  noise  in  the  training  de- 
scriptions of  the  models; 

•  Impossibility  to  learn,  when  the  agent's  theories 
are  changed.  In  this  case  some  of  the  models  of  in- 
heritance change  their  performance  classification 


Figure  8.  The  System  for  Learning  of  the  Models  of 
Inheritance 


and  the  learned  "good"  models  became  invalid. 

Both  shortcomings  correspond  to  learning  in  the 
presence  of  noise  and  concept  drift.  They  can  be  eas- 
ily avoided  for  the  learning  sub-system,  but  not  for  the 
oracle  one.  That  is  why  the  main  efforts  for  improve- 
ments have  to  be  made  in  the  oracle  based  decision 
tree  learning,  that  is  a  completely  new  area  of  research 
in  machine  learning,  ensuring  the  extension  of  the 
capabilities  of  the  system. 
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ABSTRACT 

Experience  shows  that  many  data  processing  problems  are 
difficult  to  solve,  and  some  of  these  problems  have  even 
been  proven  to  be  computationally  intractable.  Human 
experts  successfully  solve  many  such  problems  by  using 
a  hierarchical,  multi-resolution  approach.  These  multi- 
resolution  methods  are,  in  several  cases,  provably  opti- 
mal. However,  due  to  the  computational  intractability  of 
the  problem  itself,  the  multi-resolution  approach  can  only 
work  if  the  systems  that  we  are  analyzing  are  themselves 
hierarchical.  We  show  that,  first,  due  to  (inevitable)  mea- 
surement inaccuracies,  an  arbitrary  input  data  is  consis- 
tent with  the  hierarchical  model,  and  second,  that  in  many 
cases,  the  actual  physical  world  is  indeed  fundamentally 
hierarchical. 

Since  traditional  statistical  methods  have  been  designed 
primarily  for  non-hierarchical  models,  their  direct  applica- 
tion to  multi-resolution  data  processing  can  lead  to  biased 
estimates.  On  a  simple  example,  we  show  how  these  meth- 
ods can  be  corrected  to  avoid  this  bias.  Surprisingly,  the 
analysis  of  this  problem  leads  to  new  unexpected  symme- 
tries. 

KEYWORDS:  multi  -resolution  data  processing,  gran- 
ularity, computational  intractability,  wavelets,  frac- 
tals, satellite  image,  semiotics,  environmental  studies, 
Schroedinger's  paradox 

1.  DATA  PROCESSING  IS  DIFFICULT 

Data  processing  is  difficult:  an  empirical  fact.  From 
the  engineering  viewpoint,  the  main  problem  with  data  ac- 
quisition and  processing  is  to  manage  to  acquire  the  data. 
To  launch  a  successful  space  mission  to  other  planets  is  in- 
deed an  extraordinary  engineering  achievement.  However, 
even  for  such  missions,  the  pure  volume  of  acquired  data 


is  so  huge  that  processing  all  this  data  becomes  a  very 
difficult  task. 

Computers  become  faster  and  faster,  new  algorithms  are 
designed,  and  therefore,  data  processing  becomes  faster 
and  faster;  however,  at  the  same  time,  this  same  progress 
leads  to  better  data  acquisition  devices  that  drastically  in- 
crease the  amount  of  raw  data,  and  to  new  ideas  of  what 
additional  information  we  can  extract  from  the  old  data.  A 
typical  NASA-related  example:  the  University  of  Texas  at 
El  Paso,  together  with  Jet  Propulsion  Lab,  is  currently  an- 
alyzing the  data  from  Mariner  missions  to  find  relativistic 
effects  on  the  spaceships'  trajectories. 
The  complexity  of  data  processing  is  a  difficult  problem  for 
many  different  application  areas,  but  is  especially  difficult 
for  areas  in  which  it  is  relatively  easy  to  get  new  data. 
Environmental  and  earth  studies  are  one  of  such  areas:  lots 
of  relatively  easily  accessible  data  come  from  satellites,  and 
processing  this  data  becomes  more  and  more  difficult. 
This  problem  is  going  to  become  even  more  acute  in  the 
nearest  future:  Indeed,  currently,  environmental-related 
satellites  use  a  few  frequencies  (no  more  than  10).  From 
the  resulting  measurement  results,  we  only  get  a  small  sam- 
ple of  the  spectrum,  and  from  this  small  portion,  it  is  often 
difficult  to  tell  one  type  of  terrain  from  another  (or  even 
from  a  cloud  formation).  To  get  a  better  understanding 
of  Earth  features,  NASA  is  currently  planning  to  launch  a 
series  of  new  multi-spectral  satellites  that,  for  each  point, 
will  measure  up  to  several  hundred  intensities  instead  of 
the  usual  few.  This  increase  in  data  flow  is  definitely  ad- 
vantageous, but  it  makes  data  processing  even  more  more 
complicated. 

Data  processing  is  difficult:  theoretical  results.  Ev- 
ery once  in  a  while,  new  algorithms  appear  that  drastically 
decrease  the  computation  time  of  different  data  processing 
problems.  This  continuous  progress  in  algorithm  design 
may  lead  to  an  (over)optimistic  viewpoint  that  sooner  or 


145 


later,  ideal  algorithms  will  emerge  that  will  perform  all  the 
data  processing  tasks  in  real  time  (i.e.,  the  current  data  will 
be  processed  by  the  time  when  new  data  will  arrive). 
Alas,  this  optimism  is  unfounded:  numerous  theoretical 
results  show  that  in  the  general  case,  data  processing  prob- 
lems are  computationally  intractable  (or,  to  use  the  precise 
term  from  the  theory  of  computing,  NP-hard).  This  is  true 
for  general  data  processing  (see  [8]  and  references  therein), 
for  the  problems  of  reconstructing  the  past  [1,2],  for  prob- 
lems of  quantum  mechanics  [10]  and  space-time  geometry 
[7],  etc. 

Crudely  speaking,  these  results  mean  that  for  any  algo- 
rithm that  solves  a  data  processing  problem,  there  exist 
possible  data  on  which  this  algorithm  takes  exponentially 
long  time,  i.e.,  time  that  grows  as  2",  where  n  is  the  length 
of  the  input  (measured,  e.g.,  in  bits).  Already  for  reason- 
ably small  n  (e.g,  for  n  «  300),  the  required  time  exceeds 
the  lifetime  of  the  Universe.  In  short,  such  problems  are 
indeed  intractable. 

The  problem.  The  problem  is:  how  to  process  real-life 
data? 

2.  MULTI-RESOLUTION  DATA  PRO- 
CESSING IS  NECESSARY 

How  do  we  humans  solve  this  problem?  Enter  the 
idea  of  multi-resolution  data  processing.  There  are 
many  data  processing  problems  in  which  human  experts 
are  much  better  than  computers,  image  processing  one  of 
them.  To  be  more  precise: 

•  If  we  have  a  small  amount  of  data,  e.g.,  a  blurry  noisy 
image,  a  human  eye  will  only  see  the  noise,  while  a 
computer  can  perfectly  well  filter  this  noise  out  and 
get,  e.g.,  all  these  nice  pictures  that  we  see  coming 
from  the  Hubble  telescope. 

•  On  the  other  hand,  if  we  have  a  huge  amount  of  data, 
e.g.,  if  we  need  to  identify  a  face  on  a  photo  or  a 
geological  pattern  on  satellite  image,  then  a  trained 
human  eye  does  it  in  no  time,  while  supercomputers 
often  take  forever  or  even  sometimes  fail. 

How  do  we  humans  do  it?  Definitely  not  because  our  brain 
is  faster  than  a  computer:  its  main  processing  elements 
(neurons)  have  a  processing  time  of  10-100  milliseconds. 
The  main  reason  why  we  can  do  this  job  is  because  we  do 
not  store  and  we  do  not  process  the  pixel-by-pixel  image 
as  a  computer  would  normally  do.  Instead,  we  store  and 
remember  image  in  a  very  compressed  form,  usually,  as  a 
small  collection  of  standard  images  described,  usually,  in 
semiotic  form,  i.e.,  by  words  and  symbols  (mental  or  real). 
Moreover,  this  description  is  usually  hierarchical,  multi- 
scale:  first,  we  remember  and  describe  the  "big  picture" 
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(main  features),  then  we  go  into  more  details  (i.e.,  into  a  ! 
somewhat  smaller  scale),  etc. 

Since  we  humans  use  this  idea,  and  use  it  successfully,  it  j 
is  desirable  to  make  computer  programs  use  this  idea  as  i 
well. 

Wavelets:  a  mathematical  representation  of  the  I 
idea  of  multi-resolution  data  processing.  One  of  the 

most  practically  successful  formalizations  of  the  idea  of 
multi-resolution  data  processing  is  the  wavelet  technique  j 
(see,  e.g.,  [13]).  Crudely  speaking,  a  wavelet  transform  de- 
composes the  original  image  into  "sub-images"  that  corre-  ' 
spond  to  the  large-scale  details,  medium-scale  details,  etc. 
So,  we  can  store  and  process  these  sub-images  instead  of 
the  entire  original  image. 

Multi-resolution   methods    are    indeed  optimal. 

Many  people  have  heard  about  wavelets,  and  it  is  rea-  1 
sonably  well  known  that  in  many  application  areas  (e.g., 
in  data  compression)  wavelet-based  methods  are  often  in-  . 
deed  better  than  more  traditional  techniques.  What  is  less 
known  is  that  wavelet-based  methods  are  not  simply  bio-  i 
logically  motivated  and  empirically  good:  there  are  mathe- 
matical results  that  show  that  in  many  problems,  wavelet-  1 
based  methods  are  indeed  optimal.  We  will  mention  two  , 
such  results: 

•  The  first  result  is  closer  related  to  human  way  of  , 
data  processing:  it  tells  that  among  all  possible  neu- 
ral networks,  neural  networks  that  use  wavelet-type  I 
activation  functions  have  (asymptotically)  the  best 
approximation  property  [9]. 

•  The  second  result  is  directly  related  to  data  process-  ! 
ing,  namely,  to  image  processing:  it  shows  that  in  a  j 
certain  class  of  problems  (like  the  problem  of  auto- 
matically detecting  whether  a  surface  mounted  de- 
vice is  correctly  mounted  on  a  chip)  wavelets  are  in- 
deed the  best  data  compression  method  [6].  This 
second  result  is  not  just  an  asymptotic  optimality  re- 
sult: it  has  actually  led  to  successfully  wavelet-based 
image  processing  results  [3,5]. 

Multi-resolution  methods  may  be  the  best  of  the 
possible  ones,  but  how  come  they  are  good?  The 

very  fact  that  multi-resolution  methods  are  the  best  (i.e., 
the  fastest)  of  all  possible  methods  of  solving  data  pro- 
cessing problems  does  not  invalidate  the  above-cited  pes- 
simistic result  that  this  problem  is  computationally  in- 
tractable. In  other  words,  theoretically,  one  can  describe 
possible  combinations  of  data  for  which  these  methods  will 
not  work. 

However,  both  our  own  experience  (as  human  experts  we 
use  multi-resolution  methods)  and  the  experience  of  data 


processing  algorithms  that  use  multi-resolution  techniques 
show  that  these  methods  are  practically  very  feasible,  in 
other  words,  that  these  horror  worst-cases  practically  do 
not  happen  in  real  life.  The  question  is:  why?  Is  nature 
designed  in  such  a  way  that  these  methods  always  work,  or 
simply  we  were  lucky  so  far  and  bad  cases  will  still  appear 
in  the  future? 

Our  answer  is  optimistic:  yes,  nature  is  designed  in  this 
manner. 

3.  MULTI-RESOLUTION  DATA  PRO- 
CESSING IS  POSSIBLE  AND  IS  FUNDA- 
MENTAL 

Two  explanations.  We  will  give  two  explanations  why 
nature  is  designed  in  this  way: 

•  Our  first  explanation  will  say,  crudely  speaking,  that 
even  when  nature  can  form  arbitrarily  complicated 
images  and  data  strings,  the  inevitable  presence  of 
noise  and  measurement  errors  makes  every  obser- 
vation compatible  with  a  hierarchical  model  (i.e., 
with  a  model  that  can  be  handled  by  multi-resolution 
techniques). 

•  Our  second  explanation  is  that  not  only  the  approx- 
imate image  of  nature  is  hierarchical,  but  the  nature 
itself  is  granular  and  hierarchical. 

First  explanation:  Tsirelson's  theorem.  [15] 
Tsirelson  noticed  that  in  many  cases,  when  we  reconstruct 
the  signal  from  the  noisy  data,  and  we  assume  that  the  re- 
sulting signal  belongs  to  a  certain  class,  the  reconstructed 
signal  is  often  an  extreme  point  from  this  class.  For  ex- 
ample, when  we  assume  that  the  reconstructed  signal  is 
monotonic,  the  reconstructed  function  is  often  (piece- wise) 
constant;  if  we  additional  assume  that  the  signal  is  smooth 
(one  time  differentiable,  from  the  class  C1),  the  result  is 
usually  one  time  differentiable  but  rarely  twice  differen- 
tiable, etc. 

Tsirelson  provides  an  elegant  geometric  explanation  to  this 
fact:  namely,  when  we  reconstruct  a  signal  from  a  mixture 
of  a  signal  and  a  Gaussian  noise,  then  the  maximum  likeli- 
Aoot/estimation  (a  traditional  statistical  techniques)  means 
that  we  look  for  a  signal  that  belongs  to  the  priori  class, 
and  that  is  the  closest  (in  the  L2  — metric)  to  the  observed 
"signal+noise" .  In  particular,  if  the  signal  is  determined 
by  finitely  many  (say,  d)  parameters,  we  must  look  for  a 
signal  s  =  (s\, . . .,  Sd)  from  the  a  priori  set  A  C  Rd  that  is 
the  closest  (in  the  usual  Euclidean  sense)  to  the  observed 
values  6  -  (ol5 . . . ,  Od)  =  («i  +  ni, . . .  ,sd  +  nd),  where  n, 
denotes  the  (unknown)  values  of  the  noise. 
Since  the  noise  is  Gaussian,  we  can  usually  apply  the  cen- 
tral limit  theorem  and  conclude  that  the  average  value 
of  (rii)2  is  close  to  a2,  where  a  is  the  standard  devia- 
tion of  the  noise.  In  other  words,  we  can  conclude  that 


(«i)2  +  . . .  +  (rid)2  »  da2.  In  geometric  terms,  this  means 
that  the  distance  \/£2(oj  —  S{)2  =  y/^nf  between  s  and 
o  is  «  cr\/d.  Let  us  denote  this  distance  o~\fd  by  e. 
Let  us  (for  simplicity)  consider  the  case  when  d  =  2,  and 
when  A  is  a  convex  polygon.  Then,  we  can  divide  all  points 
p  from  the  exterior  of  A  that  are  e— close  to  A  into  several 
zones  depending  on  what  part  of  A  is  the  closest  to  p:  one 
of  the  sides,  or  one  of  the  edges.  Geometrically,  the  set 
of  all  points  for  which  the  closest  point  a  &  A  belongs 
to  the  side  e  is  bounded  by  the  straight  lines  orthogonal 
(perpendicular)  to  e.  The  total  length  of  this  set  is  is 
therefore  equal  to  the  length  of  this  particular  side;  hence, 
the  total  length  of  all  the  points  that  are  the  closest  to  all 
the  sides  is  equal  to  the  perimeter  of  the  polygon.  This 
total  length  thus  does  not  depend  on  £  at  all.  However, 
the  set  of  all  the  points  at  the  distance  e  from  A  grows 
with  the  increase  in  e;  its  length  grows  approximately  as 
the  growth  of  a  circle,  i.e.,  as  const  f:.  When  e  increases, 
the  (constant)  perimeter  is  a  vanishing  part  of  the  total 
length.  Hence,  for  large  e,  the  fraction  of  the  points  that 
are  the  closest  to  one  of  the  sides  tends  to  0,  while  the 
fraction  of  the  points  p  for  which  the  closest  is  one  of  the 
edges  goes  to  1. 

Similar  arguments  can  be  repeated  for  any  dimension.  For 
the  same  noise  level  a,  when  d  increases,  the  distance  e  = 
a\fd  also  increases,  and  therefore,  for  large  d,  for  "almost 
all"  observed  points  o,  the  reconstructed  signal  is  one  of 
the  extreme  points  of  the  a  priori  set  A. 
Much  less  probable  is  that  the  reconstructed  signal  be- 
longs to  the  1-dimensional  face  of  the  set  A,  even  much 
less  probable  that  s  belongs  to  a  2-D  face,  etc. 
The  main  methodological  consequence  of  this  result  is  that 
even  when  the  actual  state  space  is  continuous,  when 
we  determine  the  state  from  measurements  result,  we  in- 
evitably obtain  (most  often)  one  of  the  discretely  many 
states.  On  the  large-scale  level,  we  get  one  of  the  few  clus- 
ters. When  we  add  new  measurements  and  thus,  get  to 
the  next  level,  each  original  cluster  sub-divides  into  new 
clusters,  etc.,  so  that  we  get  a  hierarchical  structure. 

Comment:  Schroedinger's  paradox  and  other 
methodological  applications  of  Tsirelson's  result. 

In  quantum  mechanics,  this  result  explains  why  pure  states 
(extremal  points)  are  much  more  frequent  that  mixed  ones; 
in  history,  it  explains  why  there  are  finitely  many  types  of 
social  organization;  in  logic,  it  explains  why  in  spite  of  the 
clearly  fuzzy  character  of  most  human  reasoning,  binary 
logic  describes  most  of  this  reasoning  pretty  well,  etc.  In 
particular,  it  explains  the  famous  "cat"  paradox  proposed 
by  E.  Schroedinger,  one  of  the  founding  fathers  of  quantum 
mechanics, 

In  classical  physics,  it  is  assumed  that  for  each  state  of  a 
physical  system,  every  property  is  either  true  or  false.  For 
example,  a  particle  is  either  located  in  a  certain  interval 
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of  space  coordinates  [x  —  A,  x  +  A],  or  it  is  not  located 
inside  this  interval.  In  quantum  mechanics,  in  addition  to 
the  states  in  which  a  particle  is  located  within  this  inter- 
val, and  to  the  states  in  which  the  particle  is  definitely 
outside  it,  there  are  states  in  which  some  measurements  of 
the  coordinate  will  lead  to  results  within  the  interval,  and 
some  to  the  results  outside  this  interval.  In  such  states, 
we  cannot  say  that  a  statement  "the  particle  is  located  in 
the  given  interval"  is  true  or  that  this  statement  is  false; 
at  best,  we  can  determine  the  probability  of  the  "yes"  an- 
swer. (To  describe  such  unusual  "truth  value",  quantum 
logic  has  been  introduced.) 

States  with  unusual  "truth  values"  are  not  an  exception, 
but  rather  a  general  rule  in  quantum  mechanics:  e.g.,  for 
every  two  states  ip  and  ip'  with  certain  values  A  ^  A'  of 
a  measured  quantity,  there  exists  a  state  called  their  su- 
perposition in  which  the  value  of  this  quantity  is  no  longer 
certain.  (In  the  standard  formalism  of  quantum  mechanics, 
where  states  are  described  by  vectors  in  a  Hilbert  space, 
superposition  is  simply  linear  combination.) 
Such  superposition  state  is  easy  to  generate.  Schroedinger 
has  shown  that  this  superposition  principle  seemingly  con- 
tradicts our  intuition:  indeed,  suppose  that  we  have  a  cat 
in  a  box,  and  a  light-controlled  rifle  is  aimed  at  the  cat 
in  such  a  way  that  a  left-polarized  photon  would  trigger 
the  rifle  and  kill  the  cat,  while  the  right-polarized  photon 
would  keep  the  cat  alive.  If  we  send  a  photon  with  a  cir- 
cular polarization  (that  is,  according  to  quantum  mechan- 
ics, a  superposition  of  left-  and  right-polarized  states),  we 
would  get  (due  to  the  linear  character  of  the  equations  of 
quantum  mechanics),  the  superposition  of  the  states  result- 
ing from  using  left-  and  right-polarized  photons.  In  other 
words,  we  will  get  a  superposition  of  a.  dead  and  alive  cat 
states.  This  is,  however,  something  that  no  one  has  ever 
observed:  for  macroscopic  objects  (cats  included),  an  ob- 
ject is  either  dead  or  alive.  Tsirelson's  result  explains  why 
such  non-extremal  states  are  indeed  difficult  to  observe. 

Second  explanation:  Fractal  (hierarchical)  struc- 
ture of  the  Universe  [11].  At  first  glance,  the  Universe 
as  a  whole  seems  to  be  uniform:  in  whatever  direction  we 
look,  there  are,  on  a  large  scale,  approximately  the  same 
amount  of  galaxies.  However,  as  early  as  the  19th  century, 
Olbers  showed  (in  his  famous  paradox)  that  this  impres- 
sion is  false:  If  indeed  the  matter  was  homogeneously  dis- 
tributed, then  the  total  brightness  of  all  the  stars  located 
at  distances  between  R  and  R+AR  would  be  proportional 
to  the  volume  R2  ■  AR  of  the  corresponding  spherical  seg- 
ment. Since  the  brightness  dims  with  distance  as  R~2,  the 
resulting  Earth-observed  brightness  would  be  the  same  ir- 
respective of  R,  and  the  total  brightness  caused  by  all  the 
stars  would  be  infinite  or  at  least  very  large.  As  a  result, 
argued  Olbers,  it  would  be  as  bright  at  night  as  it  is  at 
daytime.  The  only  way  to  avoid  this  paradox  and  to  re- 


tain the  observable  homogeneity  with  the  observable  night 
darkness  is  to  take  into  consideration  that  the  Universe  is 
hierarchical:  stars  form  galaxies,  galaxies  form  galaxy  clus- 
ters, etc.  The  larger  scale  we  go  it,  the  less  space  is  taken 
by  matter,  and  the  more  by  vacuum.  The  resulting  fractal 
description  of  matter  distribution  is  indeed  consistent. 
Olbers  paradox  was  the  first  but  not  the  only  occurrence 
of  meaningless  infinity  in  seemingly  meaningful  physical 
equations.  Such  infinities  consistently  emerge  in  field  the- 
ory, both  classical  and  quantum.  An  interesting  mathe- 
matical fact  that  is  that  if  we  consider  field  theories  in 
space-time  of  arbitrary  dimension  d,  then  infinities  only 
occur  for  (small)  integer  d,  in  particular,  for  the  physically 
meaningful  d  —  4,  but  they  do  not  occur  for  d  =  4  —  e 
for  a  small  e  >  0.  Currently,  this  idea  is  used  as  a  formal 
trick,  to  compute  the  physical  quantities  by  using  fractal  di- 
mensions, but  it  is  reasonable  to  conclude,  from  this  result, 
that  the  actual  dimension  of  space-time  is  fractal,  i.e.,  that 
space-time  indeed  has  a  fractal  structure  [4].  Since  space- 
time  is  also  homogeneous,  this  conclusion  means,  crudely 
speaking,  that  not  all  points  from  the  4-D  continuum  de- 
scribe events  from  the  actual  space-time,  but  that  these 
events  actually  form  a  hierarchical  structure. 

4.  MULTI-RESOLUTION  DATA  PRO- 
CESSING: PROBLEMS  AND  CHAL- 
LENGES 

Traditional  statistical  methods  are  based  on  non- 
hierarchical  data.  Traditional  statistical  methods  treat 
the  entire  data  processing  as  a  single  process,  going  from 
input  (initial  data)  to  the  output  (classification  or  values 
of  different  quantities).  To  be  more  precise,  there  exist 
multi-step  methods,  but  these  are  methods  that  simplify 
the  computations  at  the  expense  of  the  artificially  added 
hierarchical  structure,  and  not  by  using  the  actual  hierar- 
chical structure. 

Traditional  statistical  methods  and  multi-resoluti- 
on data  processing:  a  problem.  Since  traditional  sta- 
tistical methods  are  oriented  towards  one-step  data  pro- 
cessing, when  we  have  a  multi-resolution,  multi-stage  pro- 
cessing, we  apply  the  traditional  statistical  methods  to 
each  stage  separately,  as  if  at  each  stage,  we  start  with 
the  raw  data,  and  return  the  final  results  of  data  process- 
ing. 

In  reality,  after,  e.g.,  the  first  step  of  data  processing,  we  do 
not  have  raw  data  any  more,  we  have  pre-processed  data; 
due  to  this  pre-processing,  the  error  probability  distribu- 
tion for  pre-processed  data  is  different  from  a  typical  error 
probabilities  for  raw  data,  and  therefore,  strictly  speaking, 
traditional  methods  are  no  longer  applicable. 
This  problem  is  very  urgent  for  processing  environmental 
data,  especially  for  processing  earth-based  environmental 
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data  (that  usually  supplements  the  data  coming  from  satel- 
lite imaging).  This  data,  usually,  does  not  come  directly 
from  measurements:  the  raw  measurement  results  are  pro- 
cessed and  generalized;  then  measurements  corresponding 
to  a  certain  small  area  are  processed  together,  etc.;  quite 
a  few  levels  of  data  processing  pass  before  we  even  get  the 
data. 

2-D  example.  How  does  this  multi-stage  processing  af- 
fect the  results?  Let  us  give  a  simple  illustrative  example. 
We  have  already  mentioned  (when  describing  Tsirelson's 
result)  that,  from  the  geometric  viewpoint,  standard  data 
processing  techniques  correspond  to  finding  the  point  s 
from  the  a  priori  set  A  that  is  the  closest  to  the  observa- 
tion point  6  (the  closest  in  the  sense  of  either  the  standard 
Euclidean  distance  or  of  its  multi-dimensional  analogue). 
Let  us  start,  for  simplicity,  with  a  2-D  case  (d  =  2),  and 
let  us  consider  an  a  priori  ellipse 

A={(Xl,x2)\£  +  xl=R>}. 

To  get  a  point  s  £  A  is  the  ultimate  goal  of  data  processing. 
At  each  intermediate  step,  we  achieve  this  goal  only  partly, 
i.e.,  we  get  a  intermediate  point  from  some  larger  set;  at 
each  stage,  this  set  gets  smaller  and  smaller  until  finally, 
we  get  a  point  from  the  desired  set  A.  It  is  natural  to 
assume  that  these  intermediate  sets  are  also  ellipses  that 
are  similar  to  the  desired  set  A,  with  the  only  difference 
that  they  correspond  to  larger  values  R;  from  one  step  of 
data  processing  to  the  next,  the  value  R  gets  smaller  and 
smaller  until  we  reach  the  desired  value. 
In  other  words,  we  start  with  a  point  o  that  comes  from 
observations.  On  the  first  step  of  data  processing,  we  find  a 
point  s  from  the  ellipse  A^  (corresponding  to  the  value 
i?'1))  that  is  the  closest  to  o.  One  the  second  step,  we  find 
the  point  s  (2>  from  the  ellipse  A^  (corresponding  to  the 
value  #(2)  <  7?(1))  that  is  the  closest  to  etc.  After 

N  processing  steps,  we  get  a  point  s  (N'  from  the  desired 
ellipse  A.  This  point  is  the  closest  to  the  previous  point 
s  (-W-i)  as  one  can  easiiy  see  geometrically,  this  point 
is  not  necessarily  the  closest  to  the  original  observation  o 
from  all  the  points  from  A.  In  other  words,  the  result 
of  multi-stage  processing  algorithm  is  different  from  the 
desired  point  s.  How  different  is  it?  And  how  can  we 
compensate  for  this  difference? 

To  simulate  the  effect  of  a  large  number  TV  of  stages  on 
the  result  of  data  processing,  let  us  consider  the  limit  of 
infinitely  many  stages.  In  this  limit,  instead  of  finitely 
many  different  ellipses  A^l\  . . . ,  A^  that  correspond  to 
decreasing  values  R^  >  . . .  >  R(N'  =  R,  we  get  a  contin- 
uous family  of  ellipses  that  correspond  to  decreasing  value 
of  the  parameter  R.  Similarly,  instead  of  finitely  many  in- 
termediate results  s^1), . . . ,  s  (N>  of  data  processing,  we  get 


a  continuous  family  of  points.  Geometrically,  this  contin- 
uous family  of  points  forms  a  curve  —  j{x\). 
To  describe  this  continuous  process,  let  us  describe  how  the 
"next"  point  on  the  curve  (that  describes  different  stages 
of  data  processing)  is  related  to  the  "previous"  one,  i.e., 
to  be  more  precise,  let  us  describe  a  differential  equation 
for  this  curve.  Each  point  (x\,  #2)  °n  this  curve  belongs  to 
a  certain  ellipse  (with  the  value  R  —  \Jx\l 'A2  +  x\).  To 
get  the  "next"  point  on  this  curve,  we  consider  a  slightly 
smaller  ellipse,  with  the  parameter  R  —  AR,  and  take  the 
point  on  that  smaller  ellipse  that  is  the  closest  to  the  given 
one.  The  straight  line  that  connects  the  original  point  with 
the  "next"  one  is,  in  geometric  terms,  a  tangent  to  the 
curve.  It  is  well  known  from  geometry  that  the  straight 
line  segment  from  any  point  to  its  closest  point  on  any 
surface  or  curve  (in  particularly,  on  an  ellipse)  is  orthogonal 
to  this  surface  or  curve  (i.e.,  to  the  tangent  to  this  surface 
or  curve).  Thus,  at  any  point,  the  tangent  to  the  desired 
curve  is  orthogonal  to  the  tangent  to  the  ellipse  that  this 
curve  currently  passes  through. 

The  tangent  to  the  ellipse  x\j  A2  +  x\  =  const  can  be 
obtained  if  we  differentiate  both  sides  of  the  ellipse's  equa- 
tion: (2x\/A2)  dx\  +  2x2  di2  =  0.  Dividing  both  sides 
by  2,  we  get  (xi/A2)  dx\  +  x-i  ■  dx2  =  0.  The  orthog- 
onal line  to  this  tangent  is,  therefore,  described  by  the 
equation  dxi/(xi/A2)  =  A2dxi/x\  =  dx2/x2-  This 
differential  equation  can  be  easily  integrated,  leading  to 
c  +  a  ■  ln(xi)  =  ln(a;2),  where  a  —  A2  and  c  is  an  ar- 
bitrary constant,  i.e.,  to  X2  —  C  ■  ij  (where  we  denoted 
C  —  exp(c)). 

We  can  use  this  equation  to  correct  the  effects  of  multi- 
stage data  processing:  namely,  if  we  know  the  values 
(xi  ,  X2)  that  correspond  to  several  different  stages  of  data 
processing,  then  we  can: 

•  use  the  least  squares  method  to  find  the  parameters 
a  and  c  from  the  equation  ln(x2)  =  c  +  a  ■  ln(xi); 

•  use  the  resulting  formula  X2  —  exp(c)  •  x\  to  recon- 
struct the  original  values  x\  and  X2\  and  then, 

•  use  least  square  method  again  to  find  the  point  s 
on  the  ellipse  A  that  is  the  closest  to  the  original 
observations  (£1,2:2). 

Mult i- dimensional  case.  In  a  2-D  case,  we  get  reason- 
ably simple  formulas.  It  turns  out  that  in  a  more  realistic 
multi-D  case,  the  resulting  formulas  are  only  slightly  more 
complicated.  Indeed,  if  instead  of  ellipses,  we  consider  el- 
lipsoids 

A  =  {(x1,...,xn)\^  +  ...+  ^  =  R2}, 

then  a  similar  orthogonality  condition  means  that  a  tan- 
gent to  the  curve  (that  represents  the  consequent  interme- 
diate results  of  multi-stage  data  processing)  is  orthogonal 
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to  the  surface  of  the  ellipsoid.  This  condition  leads  to  a 
system  of  equations 

dx\  dx2  dxn 

xi/Al  ~  x2/A22  ~         xn/Al ' 

from  which  we  conclude  that  ln(xj)  =  a*  •  ln(xi)  +  ct-,  i.e., 
that  Xi  =  C{  ■  x\' . 

Surprising  emergence  of  symmetries.  An  interesting 
side  effect  of  our  analysis  is  that  the  resulting  curve  has 
an  unexpected  symmetry:  namely,  if  we  change  a  unit  in 
which  we  measure  x\  to  a  unit  that  is  A  >  0  times  smaller 
(i.e.,  if  we  replace  x\  by  x, ;  =  A  ■  Xj),  then  we  get  exactly 
the  same  formulas  for  the  relationship  between  Xj  if  we 
appropriately  change  the  units  for  all  other  variables  x,. 
Moreover,  the  relationship  Xj  =  d  -x°*  is  the  only  possible 
relationship  with  this  property. 

This  particular  symmetry  is  very  important  (for  numerous 
examples  of  using  this  and  more  complicated  symmetries 
in  computer  science  and  data  processing,  see,  e.g.,  [14]). 
The  very  fact  that  this  important  symmetry  comes  as  a 
consequence  of  the  hierarchical  structure  of  data  process- 
ing makes  us  believe  that,  maybe,  symmetry  in  general, 
with  all  its  important  applications  in  physics  and  in  other 
areas,  can  be  explained  based  on  the  granular  hierarchical 
structure  of  the  Universe. 
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Abstract 

This  paper  introduces  a  method  for  evaluating  the 
planning  complexity.  Finding  the  optimal  branching 
allows  for  evaluation  of  the  optimum  number  of  levels 
in  a  hierarchy  of  planning. 

I.  Problem  Definition 

A  classical  planning  problem  can  be  characterized 
by  a  number  of  tasks  to  be  performed  (t),  a  number 
of  resources  that  can  perform  those  tasks  (r)  and  the 
value  of  a  cost  function  to  be  minimized  ($).  This 
description  is  general  enough  that  tasks  and  resources 
can  be  of  different  levels  of  coarseness,  and  thus  make 
the  solution  more  general. 

The  duty  of  the  planner  is  to  assign  the  tasks  to 
the  resources  and  schedule  the  times  at  which  these 
tasks  should  be  performed  so  that  they  minimize  the 
cost  function  [1,  2,  3]. 

Earlier,  it  was  demonstrated  that  planning  a 
minimum-cost  trajectory  in  a  graph  organizing  the 
process  of  planning  in  a  hierarchical  fashion  and  the 
optimum  number  of  levels  was  obtained  [4].  In  this 
paper,  we  find  the  optimum  number  of  levels  levels 
for  a  multi-resolutional  hierarchy  of  planning. 

"Assignment"  is  a  task-resource  couple.  Plan  is  de- 
fined as  a  time  tagged  concatenation  of  assignments. 
"Decision"  is  all  the  assignments  in  a  plan  with  the 
same  time  tag.  A  plan  is  a  list  of  decisions.  "First 
decision"  has  earliest  time  tag  "second  decision"  has 
the  second. 

Planners  will  create  possible  plans,  investigate  them 
and  choose  the  least  expensive.  The  complexity  of  the 
planner  depends  on  the  maximum  amount  of  combi- 
natorially  possible  plans  that  can  be  created.  It  the 
worst  case,  optimal  planners  will  study  all  plans. 

In  this  paper,  the  assumption  of  full  allocation  will 
be  taken.  No  task  will  be  left  unprocessed  if  there  is 
a  free  resource.  This  is  an  assumption  that  reduces 
the  amount  of  plans  calculated  by  the  planner. 

II.  One  Level  Planning 
Theorem  II.  1  If  the  number  of  tasks  t  is  smaller 


than  the  number  of  resources  r  then,  the  worst  case 
number  of  decisions  that  the  planner  investigates  is 

Proof.  The  number  of  combinations  of  resources  in 
tasks  is  CI  —  (rt)  =  t,^Lty.  ■  in  order  to  find  all  the 
possible  assignments,  we  have  to  permutate  each  of 
those  combinations: 

D(t,r)  =  Cl-Pt 

=  r' 

t\(r-t)\     ■  (1) 

r! 

"  (r-t)\ 

Since  t  <  r  the  planner  only  needs  to  make  one 
decision  to  complete  the  plan,  it  is  not  necessary  to 
assign  each  resource  with  more  than  one  task. 

In  most  cases,  t  >  r,  thus  plans  will  have  more  then 
one  decision. 

Theorem  II. 2  If  the  number  of  tasks  t  is  larger  than 
the  number  of  resources  r  then,  there  are  D(t,r)  = 
/t^ry  possible  first  decisions  among  all  possible  plans. 

Proof.  The  number  of  combinations  of  tasks  in  re- 
sources is  C*  =  (')  =  r,(tLry  ■  In  order  to  find  all  the 
possible  assignments,  we  have  to  permutate  each  of 
those  combinations: 

D{t,r)  =  Cl  ■  Pr 

=   r, 

r\(t-r)\     '  (2) 

t\ 

~  (t-r)\ 

Theorem  II. 3  If  the  number  of  tasks  t  is  equal  to  the 
number  of  resources  r,  then  the  worst  case  number  of 
decisions  that  the  planner  investigates  is 

D(t,r)  =  r\  =  t\  (3) 

Proof.  In  this  case,  we  have  a  simple  permutation. 
Note  that  both  Equation  1  and  2  reduce  to  Equation  3 
when  t  =  r. 
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Theorem  II. 4  The  worst  case  number  of  plans  that 
the  planner  must  investigate  in  a  one  level  system 
where  t  >  r  is 


P(t,r)  = 


[t-(z-l)r]\ 
(r  —  m)\    ^     [t  —  i  •  r]\ 


n 


(4) 


where  m  —  t  mod  r  and  n  =  int(^) 

Proof.  Since  plans  are  strings  of  decisions,  the  total 
number  of  plans  the  planner  must  investigate  in  the 
worst  case  can  be  found  by  multiplying  the  number  of 
different  first  decisions  times  the  number  of  different 
second  decisions,  and  so  forth. 


P(t,r)  =  l[Dt 


(5) 


i=i 


where  D{  are  the  number  of  ith  decisions.  For  the 
number  of  first  decisions,  Equation  2  shows  that 
D\  —  (t-ry.  assignments  are  created.  For  the  second 
decisions,  the  number  of  resources  is  the  same,  but  r 
tasks  have  already  been  processed  by  the  first  deci- 
sion. Thus,  if  t  —  r  >=  r,  D2  =  rez^Tp  i  or  if  t  —  r  <  r 
by  using  Equation  1,  D2  =  (r_t  ^od  r)<-  The  same 
thing  happens  for  the  third  decision  D3  —  h° 


t  —  2  ■  r  >=  r,  or  D3 


if  t  -  2  •  r  <  r. 


(r  —  t  mod  r)\ 

We  need  to  multiply  Equation  2  £  times  adjusting 
for  the  previously  processed  tasks,  and  then  multiply 
the  number  of  possible  last  decisions  created  by  the 
leftover  (t  mod  r)  tasks  using  Equation  1 


P(t,r)  = 


r\ 


(r  —  m)\ 


last  decisions 


fr  [t  -  (i  -  l)r]! 
II  [t-i-r]\ 


i=l 


first  n  decisions 


(6) 


where  m  =  t  mod  r  and  n  =  int(-).  Please  note 
that  if  t  <  r,  then  n  —  0,  the  part  labeled  as  "first 
decisions"  disappears,  and  m  =  t.  Therefore,  Equa- 
tion 6  becomes  Equation  1.  If  t  —  r  then  m  =  0,  the 
part  labeled  "last  decisions"  disappears,  and  n  —  1. 
Therefore,  Equation  6  becomes  Equation  3. 

III.  Using  the  Scheduler  to  Reduce  the 
Number  of  Investigated  Plans 

In  Section  II,  the  planner  worked  in  two  distinct 
stages.  First,  the  tasks  are  assigned  to  the  resources 
until  all  tasks  are  finished.  This  creates  all  the 
possible  string  of  assignments.  Then,  the  scheduler 
takes  these  strings  and  assigns  time  tags  to  the  task- 
resource  couples.  This  creates  plans  that  can  be  eval- 
uated against  the  cost  function.  In  [5]  a  planning 
method  where  the  assignment  of  the  tasks  is  woven 
with  the  scheduler  is  presented.  In  [5],  this  planner  is 


called  a  tightly  coupled  JA-SC  planner,  where  JA-SC 
means  "Job  Assigner-SCheduler."  The  scheduler  is 
invoked  after  each  decision.  So,  this  planning  proce- 
dure knows  (or  estimates)  which  resource  will  finish 
earlier  in  all  the  decisions.  [5]  assumes  of  "full  alloca- 
tion," where  no  unprocessed  task  (that  is  within  the 
assigned  ordering  constraints)  and  free  resources  will 
be  left  unassigned. 

Theorem  III.l  The  worst  case  number  of  plans  that 
a  tightly  coupled  JA-SC  planner  must  investigate  in 
one  level  where  t  >  r  is 


P(t,r)  =  t\ 


(7) 


assuming  that  no  two  resources  are  available  exactly 
at  the  same  time  except  in  the  first  decisions. 

Proof.  For  this  case,  Equation  5  still  applies.  For 
the  set  of  first  decisions,  the  tightly  coupled  JA-SC 
planner  behaves  the  same.  Equation  2  shows  that 
D\  =  jtzhy.  assignments  are  created.  For  the  set  of 
second  decisions,  however,  the  scheduler  that  inves- 
tigates each  of  the  assignments  in  the  first  decisions, 
can  estimate  which  resource  finishes  the  assigned  task 
first.  At  that  time,  only  one  resource  is  available  and 
t  —  r  tasks  to  process;  therefore,  D2  =  t  —  r.  Similarly, 
Z?3  =  t  —  r  —  1.  In  turn, 


P(t,r)  = 


t! 


(t-r)! 


(t-r)  ■  (t  -r  -  1)  • ...  •  1 


=  t\ 


The  ratio  between  the  plans  explored  by  the  loosely 
and  tightly  coupled  JA-SC  planners  can  be  expressed 
as  follows: 


P(t,r)L  _ 
P(t,r)T 


II;  : 


[*—(»  — l)r]! 


(r-m)\(t-r)\ 


n 


[«-(*-iH! 

i=2  \t-i-rV. 


(8) 


Figure  1  shows  this  ratio  plotted  for  different  number 
of  tasks  and  resources.  As  expected,  the  ratio  is  one 
when  t  <=  r  but,  it  quickly  increases  with  increasing 
number  of  tasks  and  resources. 

IV.  Effects  of  Ordering  Constraints 

Very  seldom  is  a  planning  problem  set  up  where 
there  are  no  constraints  in  the  task  order  execution. 
In  Sections  II  and  III,  constraints  were  not  consid- 
ered when  calculating  the  amount  of  plans  created 
and  their  complexity.  There  are  two  main  changes 
caused  by  the  insertion  of  constraints  to  the  complex- 
ity of  the  planner: 

1.  The  most  obvious  change  is  that  the  number  of 
tasks  and  resources  available  to  execute  at  any 
point  of  time  is  reduced.  This  reduces  the  total 
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tasks 


Figure  1 :  Ratio  between  the  number  of  plans  of  the 
loosely  coupled  to  the  tightly  coupled  JA-SC  planners 


amount  of  plans.  Unfortunately,  because  there 
various  kinds  of  constraints  it  is  not  a  simple  job 
to  determine  the  effect  that  many  ordering  con- 
straints can  make  into  the  total  number  of  plans. 
2.  The  other  change  causes  Equation  7  to  overstate 
the  number  of  decisions  created.  Suppose  there 
are  a  set  of  tasks  waiting  to  be  processed,  but 
cannot  be  executed  because  of  an  ordering  con- 
straint. Once  the  constraint  is  satisfied,  the  plan- 
ner will  have  to  deal  with  all  the  new  tasks  that 
are  now  available  for  processing  and  one  or  more 
than  one  available  resources.  If  we  have  more 
than  one  resource  available  and  more  than  one 
task  ready  to  execute,  then  the  conditions  of  The- 
orem III.l  are  not  satisfied.  In  those  cases,  the 
worst  case  number  of  plans  lays  between  Equa- 
tion 4  and  Equation  7.  Note  that  the  tightly 
coupled  JA-SC  planner  cannot  be  more  complex 
than  the  loosely  coupled,  since  the  loosely  cou- 
pled is  creating  all  the  possible  alternatives. 

Theorem  IV.  1  Every  ordering  constraint  imposed 
in  a  planning  problem  can  only  decrease  the  number 
of  plans  that  need  to  be  investigated. 

Proof.  Equation  4  shows  the  total  number  of  plans; 
all  possible  alternatives  are  counted.  Among  those 
possible  alternatives,  some  follow  the  newly  imposed 
constraints  and  some  do  not.  But,  there  are  no  newly 
created  plans,  because  there  are  no  more  possible 
plans.  By  imposing  the  constraint,  we  should  only 
count  the  plans  that  follow  the  constraint,  which  is 
smaller  (or  equal)  than  the  original  number  of  plans. 

Theorem  IV. 2  if  t  >  r  then  by  introducing  an  or- 
dering constraint  in  the  form:  ti  must  be  processed 
after  tj  is  finished,  where  tx  and  tj  are  two  tasks,  then 
the  number  of  plans  is  reduced  by  at  least  tz~  . 


Proof.  By  expanding  Equation  4  we  find  that 
t\  (t-r)\  [t-2-r)\ 


P(t,r) 


(t-r)\    (t-2-r)!    (t  -  3  ■  r)\ 


(9) 


but  since  we  know  that  in  the  first  decision,  task  ti 
cannot  be  processed  since  tj  has  not  been  processed, 
then  the  amount  of  decisions  made  in  the  first  deci- 
it-1^, .  The  second  decisions  becomes  more 


sions  is 


(i-r-l)!  ' 

complex  since  there  are  some  alternatives  in  the  set  of 
first  decisions  that  have  processed  td  and  some  have 
not.  The  total  number  of  alternatives  in  the  set  of  sec- 


ond decisions  is  smaller  than 


(t-r)< 


(t_2.r)!  since  this  equa- 
tion contains  the  number  of  all  alternatives  of  the  sec- 
ond decisions.  Then  we  find  the  ratio  using  only  the 
first  term, 


P(t,r)i 


(t-r)! 
C-D! 
(i-r-l);! 
t 


(10) 


(t-r) 


V.  Complexity 


If  we  assume  the  following,  the  complexity  of  the 
planner  can  be  represented  as  a  polynomial: 

•  the  complexity  planner  is  directly  proportional  to 
the  amount  of  times  that  the  simulator  is  invoked. 
This  simulator  has  a  complexity  of  0  s  necessary 
to  find  the  next  state  for  each  explored  decision 

•  the  complexity  of  the  planner  is  directly  propor- 
tional to  the  complexity  of  the  combinatorics  that 
are  necessary  to  create  each  assignment.  In  this 
case  since  the  nodes  in  the  hierarchy  are  com- 
posed of  decisions  and  we  assume  a  loosely  cou- 
pled JA-SC  case,  the  complexity  of  a  decision 
&D  —  tQa  where  0,4  is  the  complexity  of  creat- 
ing an  assignment. 

For  example,  if  we  have  one  resource  and  one  task, 
the  complexity  of  the  planner  is0p  =  0s  +  ©D  For 
this  paper,  we  assume  that  Qs  +  ©d  is  a  constant. 

Theorem  V.l  //  the  complexity  of  the  planning  al- 
gorithm Qp(t,r)  is  directly  proportional  to  the  com- 
plexity of  the  each  investigated  decision  Qp  and  its 
simulation  Qs  then  assuming  loosely  coupled  JA-SC 
planner: 


eP«,r)  =  (eD  +  es) 


j=l  |_i=l 


n 


[t-(i-l)r]\ 
[t-i-r]\ 


+ 


r!        A  [t-(t-l)r]! 
(r-m)!  [t-i-r]\ 

where  m  =  t  mod  r  and  n  =  int(-) 


(11) 
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Proof,  since  the  complexity  of  the  planner  depends 
on  the  amount  of  assignments  explored, 

n 

QP(t,r)  =  (0D  +  ©s)  •  ^Different  jth  decisions 


We  use  Equation  4  to  calculate  the  number  of  de- 
cisions at  each  decision  column.  Since  in  the  first 
n  —  1  decision  columns,  there  are  more  unprocessed 
tasks  than  resources  available  the  part  labeled,  "last 
decisions,"  in  Equation  6  cancels  out.  This  "last  de- 
cisions" part  will  only  count  for  the  n—  decision  col- 
umn. That  is, 


&p(t,r)  =(QD  +  QS) 


(t-r)\ 


+ 


n 


[t-(i-l)r]\ 
L  [t-i-r]l 


[t-(i-  Dr]! 


n 


+ 


+ 


(12) 


fr[<-(»-l)r]! 
11  [t-i-r]\ 


[t-(i-l)r}\ 
(r  —  m)\  [t  —  i  ■  r]\ 


n 


which  reduces  to  Equation  11 

VI.  Hierarchical  Planning 

The  benefits  of  hierarchical  control  systems  are  ob- 
vious. Although  there  is  no  consistent  theory  of  hi- 
erarchical system  design  or  optimization,  general  rec- 
ommendations are  often  based  upon  heuristics  or  in- 
tuition. Hierarchical  control  systems  are  based  on  the 
idea  that  tasks  and  resources  can  be  clustered  into 
lower  resolution  tasks  and  resources.  NIST-RCS  is  an 
example  of  how  these  control  structures  can  be  orga- 
nized. 

In  order  to  go  from  a  one  level  planner  to  a  sys- 
tem with  multiple  levels,  lets  start  by  assuming  the 
following: 

•  The  tasks  cluster  in  a  clusters.  These  clusters  are 
lower  resolution  tasks. 

•  The  resources  cluster  in  b  resource  clusters.  We 
will  call  these  cluster  lower  resolution  resources. 

•  The  number  of  tasks  in  each  lower  resolution  task 
is  approximately  equal  . 

•  The  number  of  resources  in  each  lower  resolution 
resource  is  approximately  equal. 


Theorem  VI.  1  In  a  two  level  control  hierarchy,  the 
worst  case  planning  complexity  created  by  both  levels 
is 


©2  iev(r,t)  =  e(a,b)  + 


6  ■  ©  I  ceil 


a)'CeUQ 


(13) 


Proof.  The  total  planning  complexity  should  be 
equal  to  the  complexity  created  by  the  low  resolution 
level  plus  the  complexity  created  in  the  high  resolu- 
tion level.  The  complexity  created  in  the  high  reso- 
lution level  are  ©(a,  b)  since  we  have  a  low  resolution 
tasks  and  b  low  resolution  resources.  When  the  plan- 
ner of  this  lower  resolution  level  is  finished  and  there  is 
one  low  resolution  plan  that  is  chosen  based  upon  the 
cost  function,  this  plan  goes  to  the  higher  resolution 
level.  This  level  has  b  control  nodes,  each  one  with 
its  own  planner.  The  number  of  low  resolution  tasks 
sent  to  the  higher  resolution  level  is  a,  so,  the  num- 
ber of  high  resolution  tasks  that  have  to  be  planned 
in  each  low  resolution  resource  is  |,  and  they  should 
be  processed  by  r  high  resolution  resources.  The  ceil 
function  is  used  to  round  up.  This  compensates  for 
the  branching  factor  which  may  not  exactly  divide  the 
number  of  tasks  or  the  number  of  resources. 

Theorem  VI. 2  In  a  I  level  control  hierarchy,  the 
worst  case  planning  complexity  created  in  all  levels 

is 


1-2 


©/  Ur,t)  =  YJbl-e(a,b)  + 


i-0 

.1-1 


(14) 


bl  1  •  ©  (  ceil  (      j-  j  ,  ceil  ' 


hi-\ 


Proof.  The  lowest  level  in  the  hierarchy  will  always 
create  ©(a,  b)  complexity  regarding  the  amount  of  lev- 
els that  we  have.  If  /  >  2  then  the  second  level  of  the 
hierarchy  will  have  b  ■  0(a,  b).  If  /  >  3  then  the  third 
level  will  have  b2  •  0(a,  b)  complexity.  In  other  words, 
Yli-o  bl  '  0(a,  6)  will  account  for  all  the  complexity 
created  at  each  level  of  the  I  levels  control  hierarchy 
except  the  highest  level  of  resolution.  This  last  level 
will  have  bl~l  ■  0  (ceil  (-i=t)  >ce^  (p^r))  complexity, 
since  there  are  ceil  (^r)  highest  resolution  tasks  in 
each  cluster,  and  ceil  (tt=t)  resources  in  each  cluster. 
Thus  Equation  14. 

VII.  Optimizing  the  Control  Hierarchy 

Important  questions  include  the  following:  Given 
t  and  r,  what  is  the  best  clustering  criterion?  What 
is  the  optimal  number  of  levels  that  should  be  used 
to  minimize  complexity?  The  answers  come  from  nu- 
merically inspecting  the  complexity  equations. 
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Figure  2:  log  ^es+eP  J  for  a  one  level  control  system 
as  a  function  of  the  r  and  t 
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Figure  3:  log  (^q®+qd  j  for  a  one  level  control  system 
for  20  tasks  as  a  function  of  r 

The  following  experiment  was  conducted.  The  com- 
plexity of  a  one  level  system  was  calculated  (Fig- 
ure 2)  as  a  function  of  the  number  of  tasks  and  re- 
sources. The  graph  was  calculated  by  finding  the  log- 
arithm of  Equation  11  by  assigning  /  =  2,  assum- 
ing ©5  +  QD  constant  and  equal  to  one  unit  of  com- 
plexity. The  folds  that  this  figure  presents  (See  Fig- 
ure 3)  can  be  explained  by  the  fact  that  the  num- 
ber of  assignments  can  decrease  when  the  number  of 
resources  increases.  An  obvious  example  of  this  is 
0P(3,2)  =  18  >  0p(3,3)  =  6  (using  Equation  11), 
that  can  be  seen  in  the  big  fold  at  r  =  t,  the  other 
folds  occur  when  r  is  a  multiple  of  t. 

Equation  14  was  used  to  find  the  optimal  a,  and  b 
for  each  possible  combination  of  t  and  r  within  the 
shown  range.  These  optimal  values  where  found  by 
using  brute  force  search  over  the  a  and  b  spaces.  Fig- 
ure 4  and  Figure  5  show  these  optimum  values  af- 


Figure  4:  Results  of  searching  for  the  best  a  as  a 
function  of  the  r,  t  and  b 


50     0  { 

r 

Figure  5:  Results  of  searching  for  the  best  6  as  a  func- 
tion of  the  r,  t  and  a 

ter  the  search  procedure  is  finished.  The  results  are 
non-monotonically  increasing  as  r  (in  the  case  of  a 
in  Figure  4)  and  t  (in  the  case  of  b  in  Figure  5)  are 
increasing. 

Then,  using  these  optimal  o  and  6,  the  complexity 
of  the  2  level  system  was  found  for  the  same  range 
of  t  and  r.  The  results  are  shown  in  Figure  6.  Com- 
pare the  difference  in  complexity  with  the  one  level 
system  shown  in  Figure  2  This  same  procedure  can 
be  repeated  for  more  levels. 

Although  it  is  interesting  to  find  what  is  the  optimal 
clustering  criterion  for  a  given  amount  of  tasks  and 
resources,  it  has  more  applications  to  find  the  optimal 
number  of  levels  for  a  certain  number  of  tasks  and 
resources.  This  is  done  in  the  following  way: 

1.  Given  the  number  of  tasks  t  and  the  number 
of  resources  r,  the  optimal  a  and  b  are  numer- 
ically found  by  searching  using  Equation  14  for 
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Figure  6:  log  (qs^@d  J  f°r  a  two  level  control  system 
as  a  function  of  the  r  and  t  and  using  optimal  a  and 


6 
level 


Figure  8:  log  (q^qd  1  as  a  function  of  the  number 
of  levels  (2-10)  for  some  fixed  combinations  of  r  and 


Figure  7:  log  (qj+^  )  as  a  function  of  the  number 
of  levels  (1-10)  for  some  fixed  combinations  of  r  and 


2,3,4,... ,10  levels. 
2.  The  minimum  complexity  versus  level  function  is 
numerically  found. 

Figure  7  show  4  examples:  {t  =  50,  r  =  30}, 
{t  =  50,r  =  5},  {t  =  10,r  =  30},  {t  =  10,r  =  5}. 
Figure  8  shows  the  same  figure  excluding  the  one  level 
data.  If  both  t  and  r  increase,  the  general  rule  is  that 
the  number  of  levels  necessary  to  minimize  their  com- 
plexity will  also  increase.  Different  levels  create  signif- 
icantly different  complexity  and  that  having  this  curve 
determines  the  optimum  amount  of  levels  necessary 
to  minimize  complexity.  Some  overhead  is  created  by 
adding  levels  of  resolution,  this  overhead  is  partly  ac- 
counted for  in  Equation  14,  where  extra  terms  may 
be  necessary. 


VIII.  Conclusions 

•  The  process  of  planning  was  analyzed  and  math- 
ematical expressions  were  obtained  for  both  the 
single  level  and  multi-resolutional  cases. 

•  The  complexity  of  loosely  and  tightly  JA-SC 
planners  was  analyzed  and  compared. 

•  Some  of  the  effects  of  ordering  constraints  in  the 
planners  complexity  was  studied. 

•  The  optimal  number  of  levels  which  minimizes 
the  complexity  of  a  hierarchical  control  system  is 
found. 
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ABSTRACT 

In  computer  and  electronic  manufacturing,  it  is  very  im- 
portant to  be  able  to  automatically  check  whether  the  sur- 
face mounted  devices  (SMD)  are  correctly  placed  on  the 
printed  circuit  boards.  The  inspection  of  these  boards  has 
to  be  done  on  a  shop  floor,  where  statistical  characteristics 
of  the  noise  vary  so  much  that,  in  essence,  we  only  have 
interval  estimates  for  this  noise. 

We  show  that  under  this  interval  uncertainty,  the  optimal 
image  processing  technique  consists  of  using  Haar  wavelets. 
Wavelets  indeed  lead  to  much  better  results  than  previ- 
ously used  Fourier  transform  techniques. 
On  a  more  fundamental  level,  our  result  is  a  step  to- 
wards solving  an  important  problem  related  to  wavelets 
and  multi-resolution  data  processing:  these  methods  of- 
ten empirically  work  much  better  than  other  methods, 
but  there  are  very  few  theoretical  explanations  of  this  ef- 
ficiency. Our  results  shows  that,  probably,  such  a  theo- 
retical explanation  can  be  obtained  if  we  take  (interval) 
uncertainty  into  consideration. 

1.  INTRODUCTION  TO  THE  PROBLEM 

1.1.  Case  Study:  Inspection  of  Surface  Mounted 
Devices 

Modern  electronics  manufacturing  requires  fast  and  effi- 
cient production.  As  a  result,  the  assembly  of  printed  cir- 
cuit boards  (PCB)  with  surface  mounted  devices  (SMD)  is 
usually  done  by  robots.  This  manufacturing  and  assembly 
process  is  usually  at  the  edge  of  the  current  manufacturing 
abilities,  with  a  reasonable  amount  of  PCB  produced  with 
defects.  Therefore,  it  is  extremely  important  to  inspect 
and  test  the  devices  in  order  to  weed  out  the  defective 
ones. 

Most  SMD  devices  are  so  small  that  it  is  very  difficult 
and  very  time-consuming  for  a  human  inspector  to  check 
whether  the  device  is  mounted  at  all,  and  whether  it  is 


mounted  correctly.  This  problem  is  further  complicated  by 
the  fact  that  many  PCB  have  hundreds  of  SMD.  Therefore, 
we  need  an  automatic  inspection.  For  that,  we  take  a  photo 
of  the  board,  and  we  process  the  resulting  image;  see,  e.g., 
[4,5,12]  and  references  therein. 

1.2.  It  is  Necessary  to  Compress  the  Image 

The  image  that  we  need  to  process  is  a  photo.  The  camera 
produces  an  array  of  electronic  signals  f(x)  that  describe 
the  brightnesses  f(x)  at  different  pixels  x.  The  values  f(x) 
corresponding  to  different  pixels  x  are  fed  into  the  com- 
puter. 

In  the  manufacturing  environment,  we  need  to  process  lots 
of  images,  and  for  each  image,  we  need  to  process  all  these 
values  fast,  ideally,  on  a  reasonably  cheap  PC- type  com- 
puter. If  we  were  to  apply  complicated  processing  tech- 
niques to  all  the  pixel  values,  this  would  require  lots  of 
computer  processing  time:  a  good  image  consists  of  about 
1  million  pixels,  and  even  the  most  crude  images  that  we 
have  been  processing  still  consist  of  53  x  27  «  1 , 400  pix- 
els. It  is  therefore  desirable  to  compress  this  data  to  a  few 
numbers,  and  then  base  our  decisions  on  the  compressed 
data  only. 

1.3.  How  Can  We  Compress  the  Image? 

How  is  data  compressed  in  general?  For  example,  how  do 
physicists  represent  a  dependency  y  =  f(x)  between  the 
two  quantities? 

Usually,  this  dependency  is  smooth  (and  even  analytical), 
and  therefore,  the  corresponding  function  f(x)  can  be  rep- 
resented (at  least  for  small  x)  as  a  sum  of  its  Taylor  series: 

/(*)  =  /(0)  +  /'(0)  •  x  +  |  •  f" (0)  •  x2  +  . . . 

Since  measurements  are  usually  imprecise,  we  do  not  need 
all  these  terms  to  represent  the  measurement  results;  it  is 
sufficient  to  take  only  a  few  first  terms  in  this  expansion: 
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•  To  describe  the  most  crude  measurements,  it  may  be 
possible  to  keep  only  the  first  (constant)  term  and 
take  /(x)  fa  c\  for  some  constant  c\. 

•  To  describe  better  measurements,  we  may  need  linear 
terms  as  well.  In  other  words,  we  take 

f(x)  fa  ci  +  c2  •  x 

and  use  the  coefficients  c\  and  c2  to  make  decisions 
about  the  analyzed  dependence. 

•  To  get  an  even  better  approximation,  we  may  want 
to  retain  quadratic  terms  as  well,  and  take 

f(x)  fa  ci  +  c2  ■  x  +  c3  •  x2, 

etc. 

In  this  case,  we  start  with  a  basis  consisting  of  the  func- 
tions e\(x)  —  1,  e2{x)  —  x,  c$(x)  =  x2 ,  etc.,  a  basis 
in  which  every  function  (at  least  every  function  that  is 
smooth  enough)  can  be  represented  as  an  infinite  series 

f(x)  =  ci  •  ei(x)  +  c2  •  e2(x)  +  ...  +  ck  ■  ek(x)  +  . . . ,  (1) 

and  then  we  take  several  first  coefficients  c\, . . . ,  cjy  as  the 
desired  compressed  representation  of  the  function  f(x). 
In  most  physical  cases,  the  monomials  1,  x,  x2,  etc.,  form 
a  physically  reasonable  basis;  in  other  cases,  sines  or  other 
functions  may  be  a  better  first  approximation  that  linear 
or  quadratic  ones.  But  in  general,  the  idea  of  using  the 
first  few  coefficients  of  the  expansion  seems  to  be  a  reason- 
able data  compression  method.  This  approach  is  used  in 
imaging  as  well.  In  this  paper,  therefore,  we  will  consider 
data  compression  methods  based  on  this  idea. 

1.4.  It  is  Best  to  Use  Orthonormal  Bases 

Some  of  the  known  bases  (e.g.,  sines  and  cosines)  are  or- 
thonormal in  the  sense  that 

J  e;(x)  •  ej(x)dx  =  0  when  i     j-  (2) 

J  e2i(x)dx  =  1  for  all  i.  (3) 

In  principle,  a  physical  basis  need  not  be  orthonormal;  e.g., 
the  monomials  do  not  have  this  property.  However,  it  is 
always  possible  to  transform  each  basis  {ej(x),  e2(x), . . .} 
into  a  new  orthonormal  basis  {ei(x),  e2(x), . . .}  by  using 
the  known  orthonormalization  procedure: 

ei(x)  =  0ii  •  ei(x); 

t2(x)  =  92i  ■  ei(x)  +  022  •  e2(x); 


etc.,    for    appropriate    coefficients    gij;     e.g.,    gn     =  1 
l/yjfe2(y)dy.  jj 
When  we  move  to  an  orthonormal  base,  we  do  not  lose  i 
anything,  and  we  do  not  change  the  actual  compression: 
indeed,  for  all  N,  the  class  of  all  functions  of  the  type 
ci  •  ei(x)  +  . . .  +  cjv  •  e/v(x)  is  exactly  the  same  as  the  class 
of  all  functions  of  the  type  ci -ei(x)+. .  .+cjv-e;v(x).  On  the 
other  hand,  we  do  gain  in  computation  time  when  we  turn 
to  orthonormal  bases  (and  computational  time  is  what  we 
try  to  minimize  in  the  first  place):  Indeed,  for  orthonormal 
bases,  the  computation  of  the  coefficients  becomes  much  ' 
computationally  simpler  than  for  the  general  bases: 

d  =  J  f(y)  ■  ei(y)dy.  (4)  j 

In  view  of  this  important  advantage  of  orthonormal  bases 
(and  also  in  view  of  the  absence  of  any  disadvantages), 
it  makes  perfect  sense  to  use  such  bases.   In  the  follow- 
ing text,  we  will  therefore  assume  that  the  basis  ej(x)  is  i 
orthonormal,  i.e.,  that  the  conditions  (2)-(3)  are  satisfied. 

1.5.  Taking  Measurement  Errors  into  Considera-  ! 
tion 

l 

Brightness  values  are  measured  with  a  non-negligible  inac- 
curacy. In  general,  in  image  processing,  we  use  statistical  j 
methods  of  processing  data  that  are  based  on  the  assump- 
tion  that  we  know  the  probabilities  of  different  imaging 
errors.   It  is  often  possible  to  collect  these  probabilities:  j 
we  measure  the  frequencies  of  different  errors  and  we  see 
that  as  the  number  of  experiments  grow,  these  frequencies 
tend  to  a  certain  limit  which  is  the  desired  probability. 
However,  in  our  case,  we  are  dealing  with  images  taken 
on  the  shop  floor.  The  situation  on  the  shop  floor  changes 
so  frequently  and  so  unpredictably  that  there  are  no  stable 
frequencies  of  errors.  Therefore,  we  do  not  know  the  proba- 
bilities  of  different  value  of  noise;  the  only  information  that  1 
we  have  about  the  noise  is  the  upper  bound  A  on  its  value: 
For  every  point  x,  the  difference  A/(x)  =  f(x)  —  f(x)  be- 
tween the  actual  (unknown)  brightness  /(x)  and  its  mea- 
sured value  f(x)  is  bounded  by  A: 

|A/(x)|  <  A. 

In  other  words,  after  measuring  brightness,  we  only  know, 
for  each  point  x,  the  interval  [f(x)  —  A,  /(x)+A]  of  possible 
values  of  brightness  /(x)  at  this  point  x. 
This  measurement  inaccuracy  leads  to  inaccuracy  in  the  ; 
coefficients  c;,  i.e.,  to  the  difference  Ac,  =  c,  —  c;  between 
the  ideal  values  (4)  of  these  coefficients  (the  values  that 
we  would  have  gotten  if  we  had  the  ideal  image)  and  the 
actually  computed  values 

%  =  J  f(y)-ei(y)dy.  (5)  j 
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If  this  inaccuracy  is  huge,  then  the  resulting  values  of  the 
coefficients  c,-  are  very  unreliable  and  cannot  be  used  to 
make  any  conclusions  about  the  actual  image.  So,  we  must 
make  this  difference  as  small  as  possible. 
How  can  we  express  this  idea  numerically?  If  we  recon- 
struct the  image  from  the  compressed  data,  we  get  the 
following  formula: 

fTec(x)  -  rhi(x)  +  . . .  +  mN(x), 

where  we  denoted 

fhi(x)  =  Ci  ■  e,(x).  (6) 

If  we  used  the  precise  image  instead,  we  would  have  gotten 
a  similar  (but  more  accurate)  representation 

frec{x)  =  mi(x)  +  . . .  +  mjv(x), 

where  we  denoted 

mi(x)  =  Ci  ■  ei(x).  (7) 

It  makes  sense,  therefore,  to  estimate  the  relative  quality 
of  choosing  a  function  €i(x)  as  the  largest  possible  value  of 
the  difference 

Am,(x)  =  rrii(x)  -  fhi(x).  (8) 
So,  we  arrive  at  the  following  definitions. 

2.  MATHEMATICAL  FORMULATION 
OF  THE  PROBLEM 

Let  X  be  a  space  with  a  measure  \i  (e.g.,  a  line  R  or  a 
plane  R2  equipped  with  a  standard  (Lebesgue)  measure  fi). 
By  an  orthonormal  basis,  we  mean  a  sequence  of  squarely 
integrable  functions  ej  :  X  — ►  R  (e;  £  L2(X))  that  satisfy 
the  properties  (2)-(3). 

Definition  1.  By  a  compression  scheme,  we  mean  a  pair 
({ei(x),e2(x), . . .},  N),  in  which: 

•  {e\{x),  e2(x), . . .}  is  an  orthonormal  basis,  and 

•  N  is  an  integer. 

Definition  2.  Let  ({e,(x)},  TV)  be  a  compression  scheme, 
and  let  A  >  0  be  a  positive  real  number.  This  number  A 
will  be  called  measurement  inaccuracy.  By  the  reconstruc- 
tion inaccuracy  qi  of  i-th  term  of  this  basis  ei(x),  we  mean 
the  largest  possible  value  qi  of  the  difference  \Am,i(x)\: 

qi  =      max  |Am,(x)|, 
where  maximum  is  taken  over: 


•  all  points  x  £  X, 

•  all  functions  f  £  L2, 

•  all  function  Af  £  L2  for  which  |A/(x)|  <  A  for  all 
x£X, 

and  Ami(x)  is  determined  by  the  formulas  (4)  —  (8)  with 
f(x)=f(x)-Af(x). 

Our  goal  is  to  find,  for  each  N,  the  basis  for  which  the 
reconstruction  accuracy  is  the  smallest  possible.  We  want 
to  minimize  N  different  numbers  qi, . . . ,  q^.  In  general,  if 
we  minimize  one  objective  function,  it  is  difficult  to  expect 
that  any  other  objective  function  will  be  simultaneously 
minimized.  However,  in  our  case,  we  are  lucky:  for  the 
cases  of  X  —  R  and  X  —  R2  that  correspond  to  imaging, 
there  is  a  basis  for  which  all  the  reconstruction  inaccuracies 
qi  take  the  smallest  possible  value. 

Definition  4.  By  a  ID  Haar  basis,  we  mean  the  basis  that 
consists  of  the  following  functions: 

•  the  function  <t>(x)  that  is  equal  to  1  for  0  <  x  <  1 
and  to  0  otherwise; 

•  the  function  w(x)  =  </>(2  ■  x)  —  <j>(2  ■  x  —  1)  (that  is 
equal  to  1  for  0  <  x  <  1/2,  to  -1  for  1/2  <  x  <  1, 
and  to  0  for  all  other  x); 

•  functions  Wjk{x)  =  2~^2  ■  w(2:>  ■  x  —  k),  where  j  and 
k  are  arbitrary  integers. 

Comment.  In  general,  if  we  have  a  function  w(x)  that 
tends  to  0  as  \t\  — >  oo,  and  for  which  the  functions  w{23  x  — 
k)  are  orthogonal  to  each  other,  so  that  after  normalization, 
we  get  an  orthonormal  basis  cjk  •  w(2J  ■  x  —  k),  then  this 
basis  is  called  an  (orthonormal)  wavelet  basts;  see,  e.g., 
[2,7,11,14]. 

Definition  5.  By  a  2D  Haar  basis,  we  mean  the  basis 
consisting  of  the  functions  fij{x\,X2)  =  et(xi)  ■  ej(x2), 
where  e,  and  ej  are  functions  from  the  ID  Haar  basis. 

THEOREM.  For  every  i,  the  imaging  inaccuracy  of  the 
Haar  wavelet  basis  is  the  smallest  possible. 

Comments. 

•  The  proof  of  this  theorem  is  given  in  the  following 
section. 

•  For  SMD,  wavelets,  in  particular,  Haar  wavelets,  in- 
deed lead  to  much  better  results  than  more  tradi- 
tional Fourier  transform  techniques  [1,5,6,8,9]. 


159 


•  On  a  more  fundamental  level,  our  result  is  a  step 
towards  solving  the  following  important  problem  re- 
lated to  wavelets  and  multi-resolution  data  process- 
ing: 

—  these  methods  often  empirically  work  much  bet- 
ter than  other  methods,  but 

—  there  are  very  few  theoretical  explanations  of 
this  efficiency. 

Our  result  shows  that,  at  least  on  some  cases,  such 
a  theoretical  explanation  can  be  obtained  if  we  take 
interval  uncertainty  into  consideration. 

•  A  related  problem  is:  Which  wavelet  is  the  best? 
This  problem  is  raised,  e.g.,  in  [11]. 

—  In  [3,10],  this  problem  is  analyzed  under  the  as- 
sumption that  the  measurement  errors  are  ran- 
dom, Gaussian,  and  independent.  In  this  case, 
the  best  approximation  corresponds  to  minimiz- 
ing the  sum  of  the  squares  of  these  errors.  Spe- 
cial wavelets  are  presented  that  minimize  this 
sum. 

—  In  our  paper,  we  consider  a  similar  problem,  but 
under  the  assumption  that  the  measurement  er- 
rors belong  to  the  corresponding  intervals.  The 
optimization  of  the  corresponding  worst-case  er- 
ror leads  to  Haar  wavelets. 


3.  PROOF 

1.  Let  us  first  show  that  for  an  arbitrary  basis 
{ei(x),e2(x),...,},  we  have 


qi  =  A  ■  max|e,(x)|  •  J  |e,(y)| 


dy. 


Indeed,  for  each  function  /(x),  we  have 

Am,(x)  =  mj(x)  —  m,(x)  —  C{  •  e,(x)  —  Cj  •  ej(x)  = 

c*(*)  •  J  f(y)  ■  ei(v)  dv  -  ei(x)  ■  J  f(y)  ■  e«(y)  dv  = 

e,(x)  J  e,(y)  Af(y)dy. 

Hence,  |Am,(x)|  =  |/|  •  |e,(x)|,  where  we  denoted  /  = 
/ ei(y)  ■  Af(y)dy.  Since  the  function  |Am,(x)|  is  propor- 
tional to  |e,(x)|,  the  maximum  of  |Amj(x)|  is  proportional 
to  the  maximum  of  |ej(x)|,  i.e., 

max|Amj(x)|  =  |/|  -max|ej(x)|. 


For  a  given  basis  e,(x),  the  maximum  of  the  left-hand  side 
is,  therefore,  attained  when  the  value  |/|  is  the  largest  pos- 
sible. 

In  the  definition  of  qi,  the  maximum  is  taken  over  all  func- 
tion Af  G  L2  for  which  |A/(x)|  <  A  for  all  For 
each  such  A/(x),  we  have  |e,(y)  •  A/(y)|  <  A  •  |e,(y)|,  and 
therefore, 

\I\=\Jei(y).Af(y)dy\<  J  |e,(y)  •  Af(y)\  dy  < 

J  A  ■  |e,(y)|  dy  =  A  -  J  \ei(y)\dy. 

On  the  other  hand,  for  arbitrary  B,  we  can  take  Af(y)  = 
A  •  sign(ej(y))  for  \y\  <  B  and  Af(y)  =  0  otherwise.  For 
this  choice  of  Af(y),  we  have 

1=  f  e,(y)  •  Af{y)dy  =  A  ■  f  \ei(y)\dy. 
J  J\y\<B 

The  larger  B,  the  closer  this  value  to  A-f  |e,(y)|  dy.  There- 
fore, the  largest  possible  value  of  /  is  indeed  A  •  J  |ej(y)|  dy, 
and  therefore,  the  largest  possible  value  qi  of  |Am,(x)|  is 
indeed  equal  to 

A  •max|e,(x)|  •  /  \ei(y)\dy. 

The  formula  is  proven. 

2.  Let  us  now  show  that  for  an  arbitrary  basis,  we  have 
qi  >  1. 

Indeed,  clearly,  for  all  y, 

max|e,(x)|  >  |e»(j/)|. 

X 

Therefore, 

qi  =  max|e,(x)|  ■  /  |e,(y)|  dy  = 

=  J  max|e,(x)|  •  \ei(y)\  dy  >  J  |e,(y)|  •  |e,(y)|  dy. 

Since  {e,(x)}  is  an  orthonormal  basis,  the  right-hand  side 
is  equal  to  1  and  therefore,  q%  >1. 

3.  To  complete  the  proof,  we  must  show  that  for  Haar 
basis,  qi  =  1. 

Indeed,  for  each  of  the  functions  from  this  basis,  there 
exists  a  non-zero  real  number  C  such  that  for  every  x, 
either  e,(x)  =  0,  or  |e,(x)|  =  C.  Therefore, 


qi  =  max  |e8- 

X 


■  J  \ei(y)\ 


dy 
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C-  J  \ei(y)\dy=  J  C  ■  |e,-(y)|  dy. 

We  know  that  for  every  y,  either  e,(y)  =  0,  or  |et(y)|  =  C. 
Therefore: 

•  When  e,(y)  =  0,  we  have  C  ■  |e,(y)|  =  0  =  e2(y). 

•  When  |ei(y)|  =  C,  then  C  ■  |e,(y)|  =  et2(y). 
In  both  cases,  C  ■  |e,-(y)|  =  |e,(y)|2  and  therefore, 

?,  =  /  C-|e,(y)|dy  =  |  My)|2dy. 

Since  {ej(y)}  is  an  orthonormal  basis,  we  have  g,  =  1.  The 
statement  is  proven,  and  so  is  the  theorem. 
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ABSTRACT 

The  aim  of  this  paper  is  to  propose  a  new  methodological  framework 
for  large-scaled  intelligent  systems  based  on  bioreengineering 
concepts,  and  more  generally,  on  artificial  life  principles.  Two 
viewpoints  on  artificial  life  content  are  presented:  a  narrow  one  where 
artificial  life  models  are  considered  as  a  special  case  of  multi-agent 
systems,  and  general  approach  concerning  the  specification  of  basic 
principles  and  mechanisms  of  living.  These  approaches,  underlie  the 
suggested  model  of  fuzzy  evolutionary  multi-agent  system  seen  here 
as  a  kernel  of  intelligent  enterprise.  After  specifying  main  properties 
of  intelligent  and  virtual  enterprises,  principal  architectures  of  basic 
fuzzy  multi-agent  system  and  generic  fuzzy  evolutionary  multi-agent 
system  are  built  and  studied  in  the  context  of  enterprise  integration 
and  reengineering  problems. 

Keywords:  Artificial  Intelligence,  Multi-Agent  Systems, 
Distributed  Intelligence,  Bioreengineering,  Artificial  Life, 
Evolutionary  Semiotics,  Intelligent  Enterprise,  Fuzzy  Evolutionary 
Multi-Agent  System 

1.  INTRODUCTION 

Classical  AI  models  are  penetrated  by  individual  rationalism, 
because  they  reduce  natural  intelligence  mainly  to  rational 
individualized  problem  solving  with  the  use  of  some  heuristics. 
Such  a  conventional  view  seems  to  be  very  restrictive:  it  does 
not  make  into  account  the  emergence  and  evolution  of 


intelligence  in  communication,  cooperation  and  coordination 
processes.  A  brilliant  criticism  of  classical  AI  paradigm  may 
be  found  in  [1],  where  Winograd  and  Flores  have  suggested  a 
new  approach  to  model  the  intelligence  phenomena  based  on 
Searle's  speech  acts  theory.  This  approach  studying  the 
interactions  between  two  agents  -  Customer  and  Performer  - 
has  been  successfully  applied  to  represent  the  coordination 
processes  in  organizations  by  means  of  workflow  model. 
Another  important  keystone  which  has  considerably 
contributed  to  the  change  of  classical  AI  paradigm  was  the 
book  [2]  by  Minsky  where  a  social  AI  paradigm  has  been 
introduced  in  detail.  Following  this  social  standpoint,  the 
modeling  (and  specifically,  semiotic  modeling)  of  abstract 
agents  interacting  in  some  organization  represents  the  main 
content  of  AI  studies  [3]. 

Such  a  social  paradigm  underlies  two  main  tendencies  in  the 
modern  AI  -  integration  and  decentralization.  Following 
Minsky,  for  the  sake  of  versatility,  one  can  exploit  and  manage 
the  advantages  of  several  types  of  representations  at  the  same 
time.  Specifically,  the  problem  is  to  find  out  how  to  build  an 
efficient  bridge  between  top-down  and  bottom-up  design  [4]. 
One  may  speak  about  various  levels  of  integration.  First  of  all, 
since  eighties,  one  has  began  to  couple  expert  systems  with 
conventional  informational  technologies  that  implies  the 
arrival  of  hybrid  expert  systems  or  integrated  intelligent 
systems. 
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The  development  of  integrated  intelligent  systems  has 
necessitate  the  elaboration  of  some  empirical  and  theoretical 
integration  and  communication  techniques,  in  particular,  the 
blackboard  architecture  that  is  also  of  primary  concern  in 
distributed  AI  systems.  Such  integrated  systems  are  required 
to  support  various  knowledge  models  and  various  reasoning 
types,  to  realize  both  knowledge  and  image  processing. 

Furthermore,  one  proceeds  the  way  of  integrating  some 
intelligence  attributes  (or,  some  specific  working  definitions  of 
intelligence)  with  the  aim  of  obtaining  non-linear,  synergetic 
effects  [5]  by  compensating  the  shortcomings  and  enhancing 
the  advantages  of  component  models  in  coupled  models.  Some 
typical  examples  are:  integrated  neuro-logical  and  specifically 
neuro-fuzzy  models  [6-8]  underlying  Zadeh's  soft  computing 
[9]  and  often  using  logic-based  neurons  and  distributed  fuzzy 
system  modeling  [10],  neuro-optical  models  by  Kuznetsov 
[11],  computational  intelligence  models  [12],  integrated 
concurrent  engineering  models  [13],  Petri  net  models  of  fuzzy 
neural  networks  [14],  etc. 

The  most  well-known  instances  of  hybrid  AI  strategies  are 
computational  intelligence  and  soft  computing.  So  in  the 
framework  of  soft  computing  three  AI  trends  -  logical, 
neuronal  and  evolutionary  -  or,  in  other  words,  three 
intelligence  aspects  -  uncertainty/  fuzziness  processing, 
learning  and  adaptation  -  are  amalgamated  by  representing 
fuzzy  production  rules  in  a  trainable  neural  network  which  is 
optimized  with  the  use  of  genetic  algorithms.  These  three 
components  also  form  a  kernel  of  a  more  general  area  called 
computational  intelligence;  besides,  it  involves  chaotic 
systems,  inductive  learning,  probabilistic  reasoning  and  some 
other  techniques. 

Integrated  concurrent  engineering  models  tending  to  model 
design  process  (and  more  generally,  product's  life  cycle) 
require  building  hybrid  (fuzzy  modal  and  fuzzy  non- 
monotonic) logics.  At  last,  neuro-optical  models  suggest  a 
coupling  of  connectionism  with  wave  (oscillation)  theory. 

Second,  more  fundamental  tendency,  is  closely  related  to  the 
first  one  -  AI  distribution  and/or  decentralization.  Here 
distributed  intelligent  systems  may  have  a  unique  centralized 
control  module  (or  agent),  and  the  control  in  decentralized 
systems  is  reduced  to  local  interactions  between  agents  [15]. 
Sometimes,  this  difference  is  taken  as  the  main  classification 
criterion  for  dividing  multi-agent  systems  (MAS)  into:  a) 
distributed  AI  systems  composed  from  a  few  intelligent  (and 
often  heterogeneous)  agents  (where  one  of  them  may  play  the 
subordinator  role);  b)  artificial  life  (in  the  narrow  sense  of 
term)  or  purely  collective  intelligence  [16,  17]  systems,  where 
group  intelligent  behavior  emerges  from  local  communications 
between  many  not  necessarily  intelligent  agents. 


2.       SYSTEMIC  APPROACH  VERSUS 
LOGICAL  APPROACH  IN  AI 

The  classical  intelligent  systems  using  symbolic  (or,  more 
specifically,  logical)  approaches  in  AI  based  on  cognitive 
psychology  satisfy  the  following  main  postulates  [18,  19]. 

1.  Centralization  postulate 

2.  Internal  representation  postulate  (AI  epistemology); 

3.  Knowledge-centered  postulate; 

4.  Disembodied  AI  postulate; 

5.  Closed  world  assumption  (CWA) 

6.  Independence   postulate   (cognitive   processes   may  be 
considered  independently  from  evolution); 

7.  Concertation  postulate  (between  cognition  and  language  in 
AI); 

8.  Homogeneity  postulate  (interacting  models  have  the  same, 
homogeneous  nature). 

Nowadays,  due  to  the  integration  and  decentralization 
tendencies  mentioned  above,  these  earlier  unchanging 
postulates  must  be  revised.  What  postulates  are  to  be  revised 
depends  on  the  kind  of  integrated  models  and  on  the 
integration  basis.  For  example,  the  advocates  of  artificial 
neural  networks  related  to  neurophysiological  studies 
emphasize  a  crucial  role  of  knowledge  distribution,  massive 
parallelism  and  learning,  as  well  as  the  emergence  of  symbolic 
representations  in  the  system  of  artificial  neurons.  The 
development  of  evolutionary  modeling  in  AI  means  making 
over  the  independence  postulate.  Such  AI  (or  close  to  AI) 
trends  as  moboticism  [20]  and  virtual  reality  [21]  represent 
completely  opposite  viewpoints  to  compare  with  logical 
approach.  So  the  moboticism  introducing  «beings»  as  reactive 
agents  without  representations  focuses  on  the  importance  of 
coordinated  local  reactions  for  intelligent  behavior  and  denies 
the  ideas  of  knowledge-centrism,  disembodiment,  rational 
kinematics,  centralization  and  independence.  In  addition, 
virtual  reality  systems  suppose  direct  experience  acquisition 
and  introduce  artificial  sensory-motor  coordinations.  A  rather 
close  but  less  radical  position  concerning  the  withdrawal  from 
the  postulates  above  is  proper  to  the  adepts  of  such  approaches 
as  «societies  of  minds»  by  Minsky  [2],  decentralized  [15]  and 
distributed  AI  [22,  23],  etc. 

In  our  opinion,  the  essence  of  the  paradigm  change  in  modern 
AI  may  be  caught  by  recalling  the  title  of  the  well-known 
monograph  «From  Being  to  Becoming»  by  Prigogine  [24] 
(here  it  is  interpreted  as  a  call  to  modeling  the  genesis  of 
intelligence  for  AI  needs)  and  the  words  by  Bobrow  [25]  «to 
throw  away  the  isolation  assumptions  It  is  necessary  to 
introduce  the  interaction  dimension  enabling  to  move  from 
individual  intelligent  systems  to  intelligent  organizations  and 
societies  of  intelligent  agents. 
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So,  a  convergence  of  symbolic,  neuronal  and  evolutionary 
approaches  takes  place  in  modern  AI.  Here  various  integration 
schemes  lead  to  the  generation  of  hybrid  (or  synergetic)  AI. 
Below  we  shall  discuss  artificial  life  issues  in  the  context  of 
intelligent  organizations. 

3.       ARTIFICIAL  LIFE 

Artificial  life  (AL)  is  an  interdisciplinary  field  that  focuses  on 
the  functional  analysis,  modeling  and  simulation  of  living 
systems.  The  sources  of  AL  raise  to  von  Neumann's  theory  of 
self-reproducing  automata  [26]  and  complexity  theory 
introduced  by  Kolmogorov  [27]  and  Chaitin  [28].  Its  first 
practical  results  are  related  to  simulated  evolution  concept  [29] 
and  genetic  algorithms  [30].  The  official  AL  birth  date  is  1987 
when  Langton  organized  at  Los-Alamos  (USA)  an 
interdisciplinary  workshop  on  the  synthesis  and  simulation  of 
living  systems  [31]  (recent  results  in  the  field  of  AL  are 
collected  in  [32]). 

Actually  there  are  two  different  viewpoints  of  AL.  In  a  narrow 
sense,  AL  is  seen  as  a  branch  of  multi-agent  systems,  where 
intelligent  behavior  emerges  from  local  interactions  between 
agents,  and  it  does  not  exist  any  centralized  control  agent  [16, 
17].  More  precisely,  this  viewpoint  on  AL  content  may  be 
expressed  by  the  following  5  statements: 

•  the  MAS  consists  of  a  «population»  of  rather  simple  and 
relatively  independent  agents; 

•  each  agent  himself  specifies  his  own  reactions  to  events  of 
his  local  environment  and  his  interactions  with  other 
agents; 

•  there  is  no  special  agent  who  subordinates  other  agents; 

•  there  is  no  rules  for  specifying  global  agents  behavior; 

•  all  behavior,  properties  and  structure  on  the  superior 
(collective)  level  emerges  from  local  interactions  between 
agents. 

In  a  more  general  context,  AL  is  a  pluridisciplinary  branch  that 
integrates  some  trends  of  biology  and  cybernetics,  artificial 
intelligence  and  robotics,  physics  and  mathematics,  chemistry 
and  synergetics  (Figure  1).  In  this  sense,  AL  embraces  and 
extends  AI  studies.  Like  artificial  intelligence  tending  to 
comprehend  and  to  model  various  phenomena  of  human 
psyche  and  specifically  of  natural  intelligence  (or  at  least  to 
solve  problems  or  to  perform  actions  related  to  intelligent 
behavior),  AL  deals  with  general  principles  and  mechanisms 
of  life.  Artificial  life  aims  at  abstracting  crucial  properties  of 


living  systems  (for  example,  autoreproduction, 
autoconservation  and  autoregulation)  and  putting  them  into 
practice  via  computer-based  simulation  and  development  of 
various  applied  systems.  This  definition  is  based  on 
functionally-structural  approach  to  life  phenomena:  the 
essence  of  life  is  mainly  related  to  specific  modes  of 
organizing  units,  functions  and  processes  and  cannot  be 
completely  reduced  to  the  properties  of  material  substratum 
(carbon  chains  and  DNA  structures).  So  according  to 
Kolmogorov,  basic  AL  metaphor  can  be  formulated  in  the 
following  way:  if  an  artificial  organization  is  in  some  sense 
equivalent  to  a  living  organism  and  it  has  analogical  output 
functions  to  compare  with  living  system,  then  such  an 
organization  may  be  refered  as  "living".  In  other  words,  AL 
considers  mainly  "living  processes"  beyond  the  usual  scope  of 
organic  chemistry  and  concentrates  on  their  computer  analysis 
and  simulation  (virtual  life  processes  mediated  by  computer). 

In  this  paper  we  shall  use  both  narrow  and  general  AL 
concepts.  On  the  one  hand,  in  our  model  of  fuzzy  evolutionary 
multi-agent  system  (FEMAS)  underlying  the  concept  of 
intelligent  organizations,  the  analysis  of  local  interactions 
between  agents  and  coordination  links  specifying  decentralized 
system  management  will  be  of  primary  concern.  On  the  other 
hand,  general  life  principles,  such  as  autonomy,  homeostathis 
and  autopoiesis  principles,  and  specifically  basic  evolutionary 
theory  principles  including  evolution,  adaptation  for  survival, 
mutation  and  natural  selection,  will  be  used  to  drive  the 
evolution  of  FEMAS  model. 


4.       INTELLIGENT  ENTERPRISES 

Generally  the  concept  of  intelligent  enterprise  includes  both 
the  organizational  structure  aspects  and  the  way  of  its 
development  and  functioning. 

On  the  one  hand  (from  a  structural  viewpoint),  enterprise 
intelligence  is  attributed  to  the  progressing  reversibility  of  its 
functionally-dynamic  structures,  or  the  increasing  number  of 
symmetrical  horizontal  relations  that  implies  the  higher  level 
of  coordination  and  collaboration. 

On  the  other  hand,  following  the  intelligence  definition  given 
by  Piajet  [33],  enterprise  intelligence  as  a  systemic  attribute 
may  be  defined  from  a  behavioral  viewpoint  as  a  flexible  and 
at  the  same  time  stable  equilibrium  of  enterprise  behavior.  If 
the  forms  of  enterprise's  influence  on  its  environment  become 
more  complex  and  variable,  leading  to  progressive 
compositions  and  associations,  then  its  behavior  may  be 
refered  as  more  intelligent.  Generally  speaking,  intelligence 
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Figure  1.  Branches  and  models  of  artificial  life 

includes  the  capacities  of  learning,  understanding,  acquiring 
and  using  competences  acquisition  to  adapt  to  new  situations. 
Intelligent  enterprise  is  characterized  not  only  by  a  capacity  of 
using  in  an  efficient  manner  the  knowledge  and  competences 
of  its  personnel.  Moreover,  it  is  conceived  as  a  collective 
learning  system  which  can  permanently  apprehend  and  to 
transform  itself  for  attaining  its  goals.  So  the  enterprise 
intelligence  underlies  the  ability  to  continuously  develop  the 
capacity  to  co-evolve  and  to  build  its  own  future  and  associates 
the  competitivity  level  with  the  capacity  to  learn  and  to  adapt 
faster  than  competitors. 

Definition  1.  Intelligent  enterprise  (IE)  is  a  kind  of  post- 
taylorian  enterprise  that  emerges  as  a  result  of  both  inter- 
enterprise  and  intra-enterprise  communication  processes, 
enabling  enterprise  semiosis,  and  is  driven  by  principles  of 
maximal  adaptation  to  its  environment,  optimal 
autoreengineering  for  autoconservation,  and  activity 
autoregulation  via  permanent  knowledge  mining  and 
accumulation,  needs  anticipation  and  strategic  planning,  shared 
motivation  and  collective  learning. 

So  the  genesis  of  the  enterprise  intelligence  in  communication 
processes  means  that  the  enterprises  and  their  divisions  act  as 
semiotic  entities.  To  differ  from  conventional  formal  systems, 
semiotic  systems  formally  introduced  by  Pospelov  [34]  and 
thoroughly  studied  by  Meystel  [35],  are  open  and  take  into 
account  semantic  and  pragmatic  informations  aspects  of 
knowledge;  moreover,  they  support  multi-resolution, 
nonmonotonic  and  common-sense  reasoning.  Thus,  semiotic 
systems  admit  a  variable  sign  interpretation  and  open  the 
possibilities  to  change  knowledge  and  its  processing  mode 
(inference  rules,  axioms  and  even  language). 


So  enterprise  semiosis  is  the  process  of  enterprise  knowledge 
generation,  accumulation  and  transformation  through 
communications  with  specifying  its  syntax,  semantics  and 
pragmatics.  Here  the  syntax  is  associated  with  the  enterprise 
knowledge  structure,  semantics  refers  to  the  sense  (a  relation 
"knowledge  -  source")  and  pragmatics  corresponds  to  some 
knowledge  use  (a  relation  "knowledge  -  consumer"). 

The  intrinsic  features  of  intelligent  enterprises  are  the 
capacities  to  deal  with  uncertain  and  fuzzy  informations.  The 
main  "product-based"  or  "service-based"  sources  of 
imprecision  and  fuzziness  in  enterprise  environment  are:  (a) 
multiple  interpretation  in  "client  -  supplier"  communications; 
in  particular,  product  specification  or  service  request 
formulation  in  terms  of  natural  language  that  leads  to 
semantical  ambiguity  and  multivalued  interpretation;  (b) 
incomplete  and  fragmentary  character  of  available  data  on  new 
products  and  services;  (c)  lack  or  complexity  of  analytical 
equations  describing  future  product  parameters  and,  as  a 
consequence,  rather  vague  knowledge  about  their  relationships 
(d)  principal  limitations  to  the  precision  in  specifying  both 
quantitative  and  (in  particular)  qualitative  parameters. 

Below  we  shall  introduce  fuzzy  evolutionary  multi-agent 
systems  as  a  formal  tool  of  representing  intelligent  enterprises. 

5.       FUZZY  EVOLUTIONARY 
MULTI-AGENT  SYSTEMS 

Evolutionary  modeling  of  multi-agents  systems  in  the  context 
of  intelligent  and  virtual  enterprises  represents  an  important 
step  from  traditional  mechanistic  vision  toward  a  new  living 
paradigm  in  enterprise  engineering  and  reengineering.  On  the 
one  hand,  this  living  paradigm  is  inspired  by  modern  artificial 
life  approaches  and  specifically  by  neodarwinian  approach 
reposed  on  the  principles  of  survival,  reproduction  and 
mutation.  On  the  other  hand,  the  introduction  of  evolutionary 
mechanisms  into  MAS  provides  a  natural  framework  for 
studying  various  situations  of  human  and  artificial  agents 
interactions  within  the  enterprise  with  taking  into  account 
mental  and  social  aspects.  A  comparative  analysis  of  agents 
goals,  resources  and  competences  enables  to  specify  a  concrete 
intermediate  type  of  interaction  from  pure  competition  to  pure 
cooperation,  or  from  pure  conflict  to  pure  collaboration. 

In  this  session  the  concept  of  fuzzy  evolutionary  multi-agent 
system  is  introduced  and  formalized,  and  a  generic  FEMAS 
architecture  is  proposed.  We  consider  a  special  case  of 
intelligent  organization,  where  agents  roles  are  specified 
through  virtual  enterprise  metaphor  [37,  38]. 


165 


Definition  2.  Virtual  enterprise  (VE)  is  a  sort  of  meta- 
enterprise  merging  in  an  electronic  way  the  objectives,  the 
resources  and  the  competences  of  different  enterprises  in  order 
to  develop  complex  innovative  projects  and  to  obtain  new 
world-class  products. 

So  the  birth  of  VE  as  an  innovative  intelligent  organization  is 
connected  with  selecting  and  sharing  common  goals,  unique 
competences,  compatible  resources,  including  the  most 
advanced  technologies,  from  different  enterprises  situated  in 
different  places  around  some  complex  project  that  cannot  be 
achieved  by  isolated  VE  components.  The  composition  of  VE 
also  fulfills  a  compensatory  function:  it  gives  the  opportunity 
to  diminish  disadvantages  and  to  enhance  advantages  of 
heterogeneous  components.  In  particular,  it  permits  bridging 
the  gap  between  big  enterprises  (powerful  but  not  very 
reactive)  and  small  enterprises  (rather  weak  but  ordinarily 
having  prompt  reaction  and  better  adaptation  facilities  to 
changes). 

So  the  development  of  VE  environments  enables  to  model 
multi-sites  and  multi-trade  enterprise  aspects,  including  the 
transactions  between  producer,  customer,  supplier  and 
subcontractor.  It  is  aimed  at  attaining  better  enterprise 
flexibility,  reactivity  or  agility  to  compare  with  its  components 
and  integrates  various  informations,  applications  and  processes 
concerning  a  given  project. 

Definition  3.  A  basic  fuzzy  multi-agent  system  (BFMAS)  is  a 
pair 

BFMAS  =  (X,  R),  (1) 

where  X  =  {1,  ...n}  is  a  set  of  heterogeneous  intelligent 
agents,  R  is  a  family  of  basic  types  of  (generally)  fuzzy 
relations  between  agents. 

According  to  virtual  enterprise  metaphor,  BFMAS  is  seen  as  a 
FEMAS  cell,  and  the  set  X  consists  of  5  elements  or  contains  5 
roles:  X  =  {1,...,  5},  where:  1  -  Customer  agent;  2  - 
Coordinator  agent,  3  -  Performer  agent,  4  -  Subordinator  agent, 
5  -  Observer  (Meta-Coordinator)  agent.  Let  us  discuss  in  a 
more  detailed  way  the  content  of  roles  of  these  intelligent 
agents. 

The  Customer  is  defined  as  the  agent  which  initiates  and 
determines  the  tasks  to  accomplish  or  the  works  to  do  for 
satisfying  its  need.  Customers  evaluate  performed  work  and 
declare  if  it  meets  their  satisfactions.  So  the  customer  may  be 
any  physical  or  civil  person  outside  or  inside  the  enterprise. 
The  customer  function  based  mainly  on  gnoseological 
considerations  (specifying  the  content  of  the  work)  may  be 
seen  in  some  sense  as    enterprise's  service  function  and 


expressed  in  terms  of  conversation  systems  by  the  questions  of 
type  "What  to  produce?"  or  "For  what  can  the  product  be 
used?  .  From  informational  viewpoint  it  determines  the 
semantics  of  information  exchange  during  virtual  enterprise 
agents  transactions. 

The  Coordinator  (or  the  Manager)  is  defined  as  the  agent  that 
creates,  supports  and  coordinates  a  network  of  requests  and 
commitments  necessary  to  perform  some  work.  The 
Coordinator  distributes  the  works  and  selects  available 
resources  to  ensure  a  good  performance.  Hence  his  (or  its  in 
case  of  electronic  coordinator)  principal  function  may  be 
refered  as  resource  function.  He  inspects  also  work 
performance  and  evaluates  the  quality.  So  the  coordinator  may 
be  seen  as  a  mediator  between  the  customer  and  the  performer. 
The  Coordinator  function  based  mainly  on  ontological 
considerations  may  be  considered  as  enterprise's  operatory 
function  and  corresponds  to  the  questions  of  type  "Who, 
When  and  Where  must  do  the  work?"  Moreover  it  is  related 
to  structural  aspect  of  information. 

The  Performer  is  defined  as  the  agent  (or  agents  team) 
personally  responsible  for  the  work  being  done  and  for 
declaring  its  completion.  It  needs  to  be  emphasized  that  mainly 
praxeological  considerations  related  to  methods  and 
techniques  of  accomplishing  given  work  underlie  performer's 
activity.  The  Performer  function  is  in  fact  enterprise's  basic 
technical  function  containing  its  know-how;  it  corresponds 
to  the  question  "How  to  do  the  work?"  and  specifies  the 
pragmatics  of  information  exchange  in  agents'  transactions 

The  Subordinator  (or  the  Supervisor)  is  defined  as  the  agent 
carrying  up  the  integrating  meta-function  of  organizing  and 
supervising  the  virtual  enterprise  functions  mentioned  above 
and  consequently  governing  the  enterprise  processes  viewed 
here  in  the  scope  of  the  transactions  "customer  -  coordinator  - 
performer".  His  principal  functions  are  the  enterprise 
functioning  supervision,  inspection  and  evaluation  (including 
the  agents  efficiency  evaluation).  He  (it)  has  the  power  to 
intervene  at  critical  situations  and  to  redistribute  enterprise's 
resources  in  a  timely  manner.  For  instance,  he  (it)  can  replace 
the  coordinator  and  hence  organize  a  new  business  process. 
The  Subordinator  functions  concern  the  specification  of  virtual 
enterprise  policies  including  customers  choice,  leading 
management  personnel  selection,  functional  duties  definition, 
conflict  resolution.  To  summarize,  it  means  activating 
enterprise  functions  and  neutralizing  enterprise  dysfunctions. 

So  the  Subordinator  meta-function  (to  compare  with  the 
functions  performed  by  other  agents)  is  based  on  axiological 
consideration  related  to  the  field  of  motivation,  principal 
objectives  and  strategies  development.  It  needs  anticipating 
future    tendencies    and    investigating    possible    ways  of 
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development.  Specifically  it  includes  the  questions  like  "Why 
one  will  begin  to  manufacture  or  to  sell  a  new  product  (to 
render  a  new  service)  and  how  much  of  it  will  be  produced 
(sold)?"  From  the  semiotics  viewpoint  it  gives  the  necessary 
meta-information  and  ensures  the  semiosis  of  virtual 
enterprise  cell. 

In  a  general  case  the  family  of  fuzzy  relations  R  contains  three 
types  of  relations  and  may  be  represented  by  a  partition 

R  =  R,uR2uR3  (2) 

where  R,=  {l-»2;  2->l;  2->3;  3-»2;  l->3;  3->l}  is  a  set  of 
fuzzy  horizontal  relations; 

R2=  {4— >2;  4->l;  4— »3}  is  a  set  of  fuzzy  vertical  top-down 
relations; 

R3  =  {2— >4;  1— »4;  3-»4}  is  a  set  of  fuzzy  vertical  bottom-up 
relations. 

To  depend  from  the  context,  the  set  R,  may  contain 
coordination,  collaboration  or  cooperation  relations,  and  the  set 
R2  -  subordination,  competition  or  conflict  relations.  The  set  R3 
mainly  contains  informing  relations. 

These  fuzzy  relations  can  be  specified  by  gradual  transitions 
on  continuous  «gray»  scales.  They  depend  on  the  following 
principal  factors:  a)  goal  compatibility;  b)  presence  of 
commitments  and  mutual  responsibility;  c)  level  of 
competences;  d)  level  of  individual  resources. 

So  the  agents  above  together  with  the  appropriate  relations 
compose  the  basic  multi-agent  structure  (or  cell)  for  virtual 
organization  called  SuPerCoC  (derived  from  Subordinator  - 
Performer  -  Coordinator  -  Customer).  This  cell  is  invariant 
in  respect  to  organization  type  and  activities,  as  well  as 
environment  influences. 

In  this  paper  one  reduces  the  enterprise  cell  representation  to  a 
two-level  BFMS  represented  by  a  tetrahedron  whose  basis  is 
composed  by  the  agents  (1),  (2),  (3)  together  with  their  links 
(the  PerCoC-basis)  and  the  summit  corresponds  to  the 
supervising  agent  (4)  (Figure  2).  This  representation  is  inspired 
by  a  structural-functional  unit  introduced  in  [39]. 

Here  the  exchange  of  roles  is  possible.  Moreover  each  agent 
can  play  concurrently  several  roles  (the  cases  of  coinciding  (1) 
and  (2) ,  (1)  and  (4) ,  (1) ,  (2)  and  (4)),  which  reduce  the 
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Figure  2.  Representation  of  basic  fuzzy  multi-agent  system 

tetrahedron  SuPerCoC  to  the  triangle  PerCoC  or  the  segment 
PerC. 

Definition  4.  A  generic  fuzzy  evolutionary  multi-agent 
system  is  given  by  a  sextuple 

FEMAS  =  (X,  R,  A,  Ac,  S,  E),  (3) 

where: 

X  =  {l,...,n}  is  an  heterogeneous  set  of  agents;  R  is  a  family 
of  basic  fuzzy  relations  between  agents;  A  is  a  set  of  agents 
actions;  Ac  c  A  is  a  finite  set  of  agents  communication  acts 
called  a  communication  protocol;  S  is  a  set  of  fuzzy  states  of 
FEMAS;  E  is  a  set  of  fuzzy  evolutionary  strategies  in  multi- 
agent  system.  The  sets  A,  Ac,  S  and  E  may  include  fuzzy 
parameters. 

A  typical  communication  protocol  for  VE  agents  can  be 
written  in  the  following  form  P  =  {request,  offer,  accept, 
decline,  transfer,  cancel,  etc.}.  Here  some  linguistic  modifiers 
and  quantifiers  like  very,  rather,  often,  rarely,  possibly, 
necessarily  etc.,  which  may  act  on  the  verbs  above,  as  well  as 
inexact  time  constraints  imply  the  need  in  fuzzy  models  given 
by  possibility  distributions. 

The  state  space  includes  such  states  as  initial,  normal,  critical, 
degraded,  etc.  Each  evolutionary  strategy  implies  some 
transformation  in  FEMAS  such  as  vertical  or  horizontal 
growth,  bifurcation,  self-reproduction,  etc. 

A  basic  FEMAS  architecture  underlying  the  representation  of 
intelligent  organizations  is  given  in  Figure  3.  The  processes 
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Figure  3.  Basic  FEMAS  architecture 


of  intelligent  organization  emergence,  development  and 
degradation  may  be  modeled  on  the  basis  of  FSU.  So  the 
results  of  using  different  evolutionary  strategies  in  enterprise 
management  may  be  specified  by  defining  some 
transformations  in  FSU,  such  as  its  vertical  or  horizontal 
growth,  self-reproduction  or  bifurcation,  etc.  For  example,  the 
adoption  of  empowerment  strategy  may  be  seen  as 
Subordinator's  (4)  power  delegation  to  Coordinator  (2)  who 
becomes  the  Subordinator  on  the  inferior  level  (4^  and  hence 
initiates  the  next  coordination  process  (l'<->2'<->3'),  etc. 
(Figure  3).  This  top-down  scheme  may  be  attributed  to  the 
enterprise  growth  via  (2— ^-type  transformations.  On  the 
other  hand,  the  inverse  bottom-up  scheme  is  possible,  where 
the  next  organization  level  is  generated  after  exhausting  the 
resources  of  preceeding  level.  Here  the  Ex-Subordinator  (4) 


Figure  4.  Representation  of  autonomous  team 
evolution:  a  case  of  Coordinator's  degraded  functions 

is  affected  by  new  subordination  and  coordination  links, 
and  becomes  the  Coordinator  (2')  on  the  new  level.  This. 

(4— >2)  model  corresponds  for  instance  to  generating  enterprise 
associations  and  consortiums.  Moreover  both  normal  and 
degraded  modes  of  virtual  enterprise  management  processes 
may  be  modeled  on  the  basis  of  FSU.  Normal  management 
process  is  associated  with  coordination  loop  (l<->2<->3)  and 
only  episodic  activation  of  (4— >2)  and  (2— »4)  relations 
(Principle  of  Exceptional  Subordinator's  Intervention). 
Besides,  the  direct  "customer-performer"  1— >3  and  3— »1  are 
not  activated  too.  Here  a  complete  routing  may  usually  have 
the  form  (l->2-»4->2-»3-»2->4-»2->l). 

Degraded  mode  may  result,  for  example,  from  the  conflicts 
between  Coordinator  and  Customers,  or  Coordinator  and 
Performer.  In  this  case  informing  links  1— >4  or  3— >4  may  be 
used,  and  as  a  rule  the  Subordinator  intervention,  expressed  by 
(4— »1)  and  (4— >3)  links  is  necessary.  For  instance,  in  case  of 
conflict  between  the  Customer  and  the  Coordinator,  the 
Subordinator  can  temporarily  perform  the  Coordinator 
functions  and  hence  activate  the  loop  (4314).  Moreover  he  can 
replace  the  Coordinator  2  by  some  other  agent  2'.  For 
instance,  the  Subordinator  may  entrust  the  Coordinator 
functions  2  to  Ex-Performer  3  who  finds  a  new  Performer  that 
corresponds  to  horizontal  evolution  of  enterprise  FSU  via  3(2' 
transformations).  These  situations  are  depicted  in  Figure  4. 


6.  CONCLUSION 
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A  special  language  for  intelligent  enterprise  evolutionary 
modeling  is  being  constructed  on  the  basis  of  the  introduced 
formal  representation.  It  will  contribute  to  the  implementation 
of  agent-oriented  approach  to  enterprise  reengineering  related 
to  the  development  and  use  of  new  integrated  software  tools 
combining  the  capacities  of  groupware  and  liveware. 
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ABSTRACT 

We  consider  a  class  of  discrete  resource  allocation  problems  which 
are  hard  due  to  the  combinatorial  explosion  of  the  feasible 
allocation  search  space.  In  addition,  if  no  closed-form  expressions 
are  available  for  the  cost  function  of  interest  in  a  stochastic 
environment,  one  needs  to  estimate  the  cost  function  through  direct 
on-line  observation  or  through  simulation.  We  will  present  an 
optimization  scheme  driven  by  ordinal  comparisons  of  cost 
estimates  and  show  that  it  converges  in  probability  to  the  global 
optimum.  An  important  feature  of  this  scheme  is  that  it  exploits  the 
fast  convergence  properties  of  ordinal  comparisons,  as  well  as 
eliminating  the  need  for  "step  size"  parameters  whose  selection  is 
always  difficult  in  optimization  algorithms.  An  application  to  a 
stochastic  discrete  resource  allocation  problem  is  included. 


EXTENDED  SUMMARY 

Discrete  optimization  problems  often  arise  in  the  context  of 
resource  allocation.  In  the  basic  model  we  consider  here, 
there  are  K  identical  resources  to  be  allocated  over  /V  user 
classes  so  as  to  optimize  some  system  performance  measure 
(objective  function).  Let  the  resources  be  sequentially 
indexed  so  that  the  "state"  or  "allocation"  is  represented  by 
the  ^-dimensional  vector  s  =  [sj ,..  .fjA  where s- e  { 1,. ..  JV}  is 
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the  user  class  index  assigned  to  resource  j.  Let  S  be  the  finite 
set  of  feasible  resource  allocations 

5  =  {[s1,...^K]  :s.e{l,...fl}} 

where  "feasible"  means  that  the  allocation  may  have  to  be 
chosen  to  satisfy  some  basic  requirements  such  as  stability  or 
fairness.  Let  L.(s)  be  the  class  i  cost  associated  with  the 
allocation  vector  s.  The  class  of  resource  allocation  problems 
we  consider  is  formulated  as: 

N 


where  ft.  is  a  weight  associated  with  user  class  i.  (RA)  is  a 
special  case  of  a  nonlinear  integer  programming  problem  and 
is  in  general  NP-hard.  However,  in  some  cases,  depending 
upon  the  form  of  the  objective  function  (e.g.,  separability, 
convexity),  efficient  algorithms  based  on  finite-stage 
dynamic  programming  or  generalized  Lagrange  relaxation 
methods  are  known  (see  [1]  for  a  comprehensive  discussion 
on  aspects  of  deterministic  resource  allocation  algorithms). 
Alternatively,  if  no  a  priori  information  is  known  about  the 
structure  of  the  problem,  then  some  form  of  a  search 
algorithm  is  employed  (e.g.,  Simulated  Annealing,  Genetic 
Algorithms). 

In  general,  the  system  we  consider  operates  in  a 
stochastic  environment;  for  example,  users  may  request 
resources  at  random  time  instants  or  hold  a  particular 
resource  for  a  random  period  of  time.  In  this  case,  L.(s)  in 
(RA)  becomes  a  random  variable  and  it  is  usually  replaced 
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by  E[L.(s)].  Moreover,  we  wish  to  concentrate  on  complex 
systems  for  which  no  closed-form  expressions  for  L.(s)]  or 
E[L.(s)].  are  available.  Thus,  E[L.(s)].  must  be  estimated 
through  Monte  Carlo  simulation  or  by  direct  measurements 
made  on  the  actual  system.  Problem  (RA)  then  becomes  a 
stochastic  optimization  problem  over  a  discrete  state  space. 

While  the  area  of  stochastic  optimization  over 
continuous  decision  spaces  is  rich  and  usually  involves 
gradient-based  techniques  as  in  several  well-known 
stochastic  approximation  algorithms  [2]-[3],  the  literature  in 
the  area  of  discrete  stochastic  optimization  is  relatively 
limited.  Most  known  approaches  are  based  on  some  form  of 
random  search  with  the  added  difficulty  of  having  to  estimate 
the  cost  function  at  every  step.  Such  algorithms  have  been 
recently  proposed  by  Yan  and  Mukai  [4]  and  Gong  et  al  [5]. 
Another  recent  contribution  to  this  area  involves  the  ordinal 
optimization  approach  presented  in  [6].  Among  other 
features,  this  approach  is  intended  to  exploit  the  fact  that 
ordinal  estimates  are  particularly  robust  with  respect  to 
estimation  noise  compared  to  cardinal  estimates;  that  is, 
estimating  the  correct  order  of  two  costs  based  on  noisy 
measurements  is  much  easier  than  estimating  the  actual 
values  of  these  costs.  The  implication  is  that  convergence  of 
such  algorithms  is  substantially  faster.  These  recent 
contributions  are  intended  to  tackle  stochastic  optimization 
problems  of  arbitrary  complexity,  which  is  one  reason  that 
part  of  the  ordinal  optimization  approach  in  [6]  includes  a 
feature  referred  to  as  "goal  softening".  On  the  other  hand, 
exploiting  the  structure  of  some  resource  allocation  problems 
can  yield  simpler  optimization  schemes  which  need  not 
sacrifice  full  optimality.  This  is  the  approach  we  take  in 
tackling  problems  of  the  form  (RA) 

In  this  work,  we  first  consider  the  deterministic  version 
of  problem  (RA)  and  provide  a  necessary  and  sufficient 
condition  for  global  optimality,  based  on  which  we  develop 
an  optimization  algorithm.  We  analyze  the  properties  of  this 
algorithm  and  show  that  it  yields  a  globally  optimal 
allocation  in  a  finite  number  of  steps.  We  point  out  that, 
unlike  resource  allocation  algorithms  presented  in  [1],  an 
important  feature  of  the  proposed  algorithm  is  that  every 
allocation  in  the  optimization  process  remains  feasible,  so 
that  our  scheme  can  be  used  on  line  to  adjust  allocations  as 
operating  conditions  (e.g.,  system  parameters)  change  over 
time.  Next,  we  address  the  stochastic  version  of  the  resource 
allocation  problem.  By  appropriately  modifying  the 
deterministic  algorithm,  we  obtain  a  stochastic  optimization 
scheme.  We  analyze  its  properties  treating  it  as  a  Markov 
process  and  prove  that  it  converges  in  probability  to  a 
globally  optimal  allocation  under  mild  conditions. 

Details  are  provided  in  a  full  paper  [7].  Here,  we  limit 
ourselves  to  pointing  out  two  features  of  the  resource 
allocation  scheme  we  analyze  which  are  worth  noting 
because  of  their  practical  implications.  All  iterative 
reallocation  steps  are  driven  by  ordinal  comparisons,  which, 
as  mentioned  earlier,  are  particularly  robust  with  respect  to 
noise  in  the  estimation  process.  Consequently,  (i)  As  in  other 
ordinal  optimization  schemes,  convergence  is  fast  because 
short  estimation  intervals  are  adequate  to  guide  allocations 


towards  the  optimal,  and  (ii)  There  is  no  need  for  "step  size" 
or  "scaling"  parameters  which  arise  in  algorithms  driven  by 
cardinal  estimates  of  derivatives  or  finite  differences; 
instead,  based  on  the  result  of  comparisons  of  various 
quantities,  allocations  are  updated  by  reassigning  one 
resource  with  respect  to  the  current  allocation.  This  avoids 
the  difficult  practical  problem  of  selecting  appropriate  values 
for  these  parameters,  which  are  often  crucial  to  the 
convergence  properties  of  the  algorithm. 
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ABSTRACT 

In  the  study  of  discrete-event  dynamic  systems  (DEDS), 
accurate  analytical  models  are  generally  either  unavailable  or  too 
complex  to  evaluate  numerically.  Thus,  the  determination  of  optimal 
control  policies  often  entails  the  simulation  of  many  alternative 
systems  in  parallel.  To  reduce  the  computational  burden,  we 
investigate  the  use  of  ordinal  optimization.  This  approach  focuses  on 
determining  control  policies  that  perform  relatively  well  (although 
not  necessarily  optimally),  without  necessarily  obtaining  accurate 
performance  measures. 

In  this  paper  we  study  three  techniques  for  the  ordinal 
optimization  of  DEDS,  namely  the  use  of  short  simulation  runs,  crude 
analytical  models,  and  imprecise  simulation  models. 

KEYWORDS:  optimization,  ordinal  optimization,  simulation, 
standard  clock 

1.  INTRODUCTION 

The  motivation  underlying  ordinal  optimization  [1]  is  that 
finding  the  optimal  solution  (or  control  policy)  is  often  too 
costly  or  time  consuming,  although  a  suboptimal  solution  (that 
may  be  found  quite  easily)  may  provide  sufficiently  good 
performance. 

In  this  paper  we  study  three  techniques  for  the  ordinal 
optimization  of  DEDS.  First,  we  demonstrate  that  remarkably 
accurate  performance  rankings  can  be  obtained  on  the  basis  of 
short  simulation  runs.  In  particular,  for  our  example  of 
admission  control  in  communication  networks,  we 
demonstrate  that  the  high  degree  of  correlation  introduced  by 
the  Standard  Clock  parallel  simulation  technique  provides 
considerably  better  performance  (i.e.,  faster  convergence  to 
accurate  rankings)  than  the  use  of  Common  Random 
Numbers.  Our  second  approach  is  the  use  of  crude  analytical 
models  that  capture  the  crucial  aspects  of  system  behavior. 
Again,  remarkably  accurate  policy  rankings  are  achieved  (in 
this  case  without  using  simulation  at  all),  even  though  such 
crude  models  provide  poor  estimates  of  actual  performance;  if 
accurate  performance  measures  are  needed,  it  is  sufficient  to 
simulate  performance  for  only  the  best  few  policies 
determined  in  this  manner.  Our  third  approach  is  the  use  of 
imprecise  simulation  models  that,  like  the  crude  analytical 
models,  incorporate  the  salient  features  of  the  communication 
network  model. 
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2.  THE  STANDARD  CLOCK  TECHNIQUE 

The  SC  simulation  technique  [2,  3,  4],  which  permits  the 
simultaneous  evaluation  of  system  performance  under  a  large 
number  of  control  policies,  is  a  primary  component  of  our 
ordinal  optimization  techniques.  Although  the  basic  SC 
simulation  technique  is  limited  to  exponential  interevent 
times,  the  technique  has  been  extended  to  some  more  general 
examples  as  well  [5],  [6],  [7]. 

In  SC  simulation,  each  event  is  determined  by  two 
random  numbers,  one  to  specify  the  timing  of  the  next  event, 
and  the  other  to  specify  its  type.  It  is  possible  that  an  event 
determined  in  this  manner  turns  out  to  be  infeasible  (e.g.,  a 
departure  from  an  empty  system).  The  interevent  time  of  such 
a  "fictitious"  event  is  used  to  update  the  system  time  as  if  the 
event  were  "real"  and  did  in  fact  occur,  but  no  state  change 
occurs  (the  fictitious  event  is  discarded). 

The  improved  efficiency  of  the  SC  method  is  achieved  by 
using  the  resulting  sequence  of  (interevent  time,  event  type) 
pairs,  known  as  the  clock  sequence,  to  simultaneously 
generate  sample  paths  for  a  number  of  structurally  similar,  but 
parametrically  different,  systems.  This  reduction  in  the 
number  of  events  that  has  to  be  generated  has  a  dramatic  effect 
on  the  overall  simulation  time  because  the  generation  of 
events  is  considerably  more  time  consuming  than  the 
consequent  updating  of  system  state  [7].  Furthermore,  the  use 
of  a  common  event  sequence  introduces  a  high  level  of 
correlation  among  the  parallel  simulations,  greatly 
accelerating  the  determination  of  good  control  policies  by 
means  of  ordinal-optimization  techniques,  as  we  demonstrate 
in  this  paper. 

3.  THE  ADMISSION-CONTROL  PROBLEM 
IN  CIRCUIT-SWITCHED  NETWORKS 

We  have  studied  the  problem  of  admission-control  in 
circuit-switched,  multihop,  wireless  networks.  In  our 
examples,  a  circuit  is  established  over  a  predetermined  path 
between  the  originator  of  a  call  and  its  destination  node  for  the 
duration  of  the  call.  Contention-free  channel  access  is 
implemented  by  means  of  frequency-division  multiple  access 
(FDMA);  thus  a  necessary  and  sufficient  condition  for  a  call  to 
be  established  is  that  a  transceiver  is  available  at  every  node  in 
its  circuit.  The  model  is  described  in  greater  detail  in  [8]  and 
[9]. 

Figure  1  shows  an  example  of  a  ten-node  wireless 
network  that  supports  circuit-switched  voice.  The  multihop 
paths  corresponding  to  each  of  J  =  5  circuits  are  distinguished 
by  different  shadings.  The  dashed  lines  connect  nodes  that  are 
within  communication  range  of  each  other,  but  that  do  not 
support  any  of  the  five  circuits. 
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Fig.  1  -  An  example  circuit-switched  voice  network. 

Calls  for  circuit  j  are  generated  according  to  a  Poisson 
process  with  rate  X,  and  their  durations  are  exponentially 
distributed  with  parameter  fi;.  A  vector  description  of  circuit  / 
in  terms  of  the  nodes  it  traverses  is  given  by  Cj  =  {c^,  cj2, 
cjN),  where 

f  1,  if  circuit  j  traverses  node  i 


Jl     [0,  otherwise 

and  /V  is  the  number  of  nodes  in  the  network.  The  state  of  the 
system  with  J  source-destination  pairs,  and  hence  J  circuits,  is 
described  by  the  vector  x  =  {xx,  x2,  Xj),  where  Xj  is  the 
number  of  calls  currently  active  on  circuit  j,  each  of  which  is 
referred  to  as  a  call  of  type  j.  A  central  controller  makes  the 
decisions  on  whether  or  not  to  accept  calls  based  on  perfect 
knowledge  of  the  number  of  calls  of  each  type  that  are 
currently  in  progress  (i.e.,  the  system  state  x),  and  hence  the 
set  of  resources  that  are  available  for  new  calls.  The 
transceivers  needed  to  establish  a  circuit  are  acquired 
simultaneously  when  the  call  arrives,  and  are  released 
simultaneously  when  the  call  is  completed.  Calls  are  blocked 
when  one  or  more  nodes  along  the  path  do  not  have  a 
transceiver  available,  or  when  a  decision  is  made  not  to  accept 
a  call  despite  the  availability  of  transceivers.  Blocked  calls  are 
lost  from  the  system.  These  assumptions,  coupled  with  the 
class  of  coordinate-convex  admission-control  policies 
discussed  below,  lead  to  a  mathematically  tractable 
description  of  system  performance. 

The  number  of  transceivers  at  node  i  is  denoted  by  T,.  No 
more  than  T{  calls  can  simultaneously  use  the  resources  at 
node  i,  i.e., 

j 


,xjcH 


<T;,  i  =  !,-,#. 


7=1 


These  equations,  which  are  termed  the  "system  constraints," 
limit  the  state  space  Q0  in  which  x  is  allowed  to  take  values. 
We  refer  to  a  system  that  is  solely  constrained  by  these  system 
constraints  as  an  "uncontrolled"  system. 

Network  performance  can  often  be  improved  by 
administering  an  admission-control  policy  that  blocks  some 
calls  even  though  resources  are  available  [9].  We  define  the 
admission-control  policy,  denoted  by  Q,  in  terms  of  a  set  of 
circuit  "thresholds"  and  a  set  of  "linear-combination 
constraints."  Thresholds  restrict  the  number  of  calls  that  will 
be  admitted  to  the  individual  circuits,  and  can  be  expressed  as 

xj  <  Xj  =  threshold  on  circuit  j,      j  =  1,. 
The  linear-combination  constraints  are  defined  as 

2j  Xj  <Yj  =  threshold  on  total  number  of  call  types 
jes,  that  are  members  of  set  5/ 

for  suitable  subsets  5/  of  the  set  of  all  call  types.  In  particular, 


for  the  example  of  Fig.  1,  three  linear-combination  constraints 
are  imposed  by  node  5  (jc,  +  x3  <  Yu  x,  +  x5  <  Y3,  and  x3  +x5< 
Y4)  and  two  are  imposed  by  node  7  (xt  +  x4  <  Y2  and  x4  +  x5  < 
Y5).  Thus,  the  control  policy  is  given  by  Q.  =  {Xu  Xj,  Yx, 
...}.  Now  the  problem  is  the  determination  of  the  optimal 
admission-control  policy  Q.*,  i.e.,  the  values  of  the  X;  and  the 
Y,  that  yield  the  optimal  network  performance. 

The  use  of  this  form  of  control  policy  assures  us  that  the 
state  space  is  "coordinate  convex"  [10].  A  coordinate-convex 
policy  is  specified  in  terms  of  the  set  of  admissible  states; 
completed  calls  are  not  blocked  from  leaving,  on-going  calls 
do  not  get  rerouted,  and  new  calls  are  admitted  with 
probability  1  if  the  state  to  be  entered  is  in  the  admissible 
region. 

Under  our  assumption  of  Poisson  arrival  statistics,  the  use 
of  coordinate-convex  policies  results  in  the  so-called  product- 
form  characterization  of  the  system  [11,  12].  To  determine  the 
optimal  policy,  performance  must  be  evaluated  for  many 
candidate  policies.  In  view  of  the  computationally  intensive 
nature  of  the  evaluation  of  product-form  solutions,  it  is 
desirable  to  develop  efficient  simulation  approaches  for  this 
problem. 

4.  SC  SIMULATION  OF  CIRCUIT- 
SWITCHED  NETWORKS 

We  have  used  SC  techniques  to  evaluate  the  performance 
of  the  circuit-switched  network  of  Fig.  1  under  a  number  of 
different  admission-control  policies.  The  discrete  parameters 
of  interest  in  this  case  are  the  circuit  thresholds  Xj  and  the 
linear-combination  constraints  Y,.  The  events  that  must  be 
generated  are  arrivals  and  departures.  An  arrival  to  circuit j  is 
denoted  by  ap  and  occurs  at  a  rate  of  X,.  The  departure  rate  for 
each  active  call  on  circuit  j  is  thus,  if  there  are  x,  active 
calls  on  circuit  j,  the  departure  rate  for  calls  of  type  j  is  Xj\lj. 
This  situation  is  translated  to  events  in  the  simulation  as 
follows.  We  define  Xjmax  to  be  the  maximum  value  Xj  can  have 
over  the  entire  set  of  policies.  Thus 

xjmax  =  min  (Ti)- 
i*cji=l 

Because  there  may  be  up  to  X-     active  calls  on  circuit  j,  we 
must  consider  Xjmax  different  departure  events  djn  (n  =  1, 
Xjmax)  f°r  circuity,  namely 

djn  =>  feasible  departure  from  circuit  j  if  x;  >  n, 
otherwise  fictitious  event. 
Following  the  usual  technique  used  in  uniformization  [13],  the 
maximal  rate  of  this  system  is 

J 


7=1 


X 


jmax 


The  upper  part  of  Fig.  2  shows  the  "ratio  yardstick"  for 
the  network  of  Fig.  1 ,  where  the  maximum  threshold  on  each 
circuit  is  three.  To  determine  the  event  type,  a  random 
number,  uniformly  distributed  over  [0,1],  is  generated.  The 
event  type  (e.g.,  a,  or  djn)  is  determined  by  the  interval  into 
which  the  random  number  falls.1  This  ratio  yardstick 
corresponds  to  operation  of  the  network  at  the  maximal  rate, 
and  it  can  generate  all  events  that  are  needed  for  the 
simulation  of  systems  with  control  policies  in  which  the  circuit 


1  The  computational  effort  involved  in  determining  the  event  type  by 
means  of  the  alias  method  [14]  is  independent  of  the  number  of  event 
types,  thus  making  this  method  suitable  for  complex  systems. 
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thresholds  do  not  exceed  3.  In  our  example  X ,=  =  \x =  1,  which 
implies  that  all  events  are  equally  likely;  therefore,  all  the 
intervals  have  the  same  width.  All  arrival  events  in  this 
system  are  real,  and  the  corresponding  calls  are  accepted  as 
long  as  their  acceptance  does  not  violate  any  of  the  system 
constraints  or  control  constraints.  However,  departure  events 
are  fictitious  if  an  insufficient  number  of  calls  of  that  type  are 
currently  active,  as  discussed  above.  For  example,  referring 
again  to  the  upper  part  of  Fig.  2,  an  event  of  type  dn  will  be 
fictitious  (and  hence  ignored,  although  time  will  be  updated)  if 
there  are  less  than  three  calls  of  type  1  currently  active. 

Ratio  Yardstick  based  on  Maximal  Rates; 
I  Threshold  on  all  circuits  =  3; 

A  =  20;  Xj=  u,=  1  (=>  all  events  equally  likely)  : 

I  1  ►! 


a. 

a2 

a3 

a4 

a5 

dl2 

dp 

d21 

d22 

d23 

d31 

d32 

i33 

d4, 
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d52 

353 

-arrivals- 


a,    a2    a3    a4    a5   d,,  d2,   d22  d23  d3]  d 


d4,  d42  d43  d5, 


-departures 


I  Custom  Yardstick  for  Controlled  System  with 

:  j  Threshold  X=  (1,3,2,3,1); 

A'  =  2(X.7-  +  Xj\ij)  =  15;  X^u,^  1  (=>  all  events  equally  likely) 

Fig.  2  -  Ratio  yardsticks  for  the  circuit-switched  network. 
To  reduce  the  number  of  fictitious  events  (and  thus 
further  improve  the  efficiency  of  the  simulations)  we  have 
considered  the  use  of  "custom"  ratio  yardsticks,  which  are 
constructed  to  conform  with  the  particular  parameters  of  each 
sample  path.  The  lower  part  of  Fig.  2  shows  a  custom  ratio 
yardstick  for  an  example  in  which  the  circuit  thresholds  have 
been  lowered  to  {1,3,2,3,1}.  A  number  of  events  have  been 
eliminated  because  they  will  always  be  fictitious,  thus 
resulting  in  a  reduction  in  the  event  rate  from  A  =  20  to  A'  = 
15;  e.g.,  with  the  threshold  on  circuit  1  reduced  to  1,  the 
departure  events  dn  and  dn  will  always  be  fictitious.  Clearly, 
by  eliminating  them  from  the  ratio  yardstick  we  can  increase 
the  percentage  of  real  events.  The  remaining  events  are  again 
equally  likely,  although  there  are  only  15  of  them  instead  of 
20. 

Despite  the  potential  advantages  of  custom  ratio 
yardsticks,  we  have  found  in  our  SC  simulations  that  the 
added  burden  of  maintaining  and  accessing  separate  custom 
yardsticks  for  every  sample  path  outweighs  the  reduction  in 
the  number  of  fictitious  events  provided  by  their  use  [15]. 
Furthermore,  the  use  of  custom  ratio  yardsticks  reduces  the 
level  of  correlation  among  the  sample  paths,  thus  resulting  in 
poorer  ordinal  rankings,  as  is  shown  in  Section  5.2.4. 

5.  ORDINAL  OPTIMIZATION  USING 
SHORT  SIMULATION  RUNS 

We  have  performed  SC  simulations  of  the  network  shown 
in  Fig.  1,  with  the  goal  of  determining  the  optimal  policy,  by 
using  the  model  discussed  in  Sections  3  and  4.  Eight 
transceivers  are  assumed  present  at  each  node  in  the  network. 
Recall  from  Section  3  that  a  policy  for  this  network  can  be 


written  as  Q.  =  {X, 


X5,  Yu 


y5)- 


For  XAl  =  4.0, 2  we 


2  Blocking  probability  does  not  depend  on  the  individual  values  of  ^, 
and  [Lj,  but  only  on  the  ratios  p;  =  A./uy. 


have  simultaneously  evaluated  the  120  different  control 
policies  {X,,  6,  6,  6,  X5,  8,  8,  K3,  8,  8},  where  X,,  X5  =  0,...6; 
y3  =  0,. . .,8;  y3  <  X,  +  X5;  X,  <  y3;  and  X5  <  Y3. 
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Policies  ranked  on  exact  blocking  probability 

Fig.  3  -  Probability  of  blocking  across  the  range  of  policies. 

5.1.  Evaluation  of  Voice-Call  Blocking  Probability 

In  Fig.  3  we  show  the  exact  and  simulated  voice-call 
blocking  probability  associated  with  the  120  control  policies. 
The  exact  results  are  determined  numerically  from  the 
product-form  solution,  and  simulated  results  are  based  on  runs 
of  104  and  106  voice-arrival  events.  The  horizontal  axis  is 
simply  the  ordering  from  the  best  (minimum  blocking 
probability)  policy  to  the  worst,  based  on  the  exact  model; 
thus  blocking  probability  is  a  monotonically  nondecreasing 
function  of  the  horizontal  axis.  It  is  apparent  from  the 
closeness  of  the  curves  in  Fig.  3  that  the  results  of  the  longer 
simulation  are  extremely  accurate;  the  simulation  error  is 
never  more  than  0.2%.  It  is  also  apparent  that  the  shorter 
simulation  is  not  long  enough  to  predict  blocking  probability 
accurately. 
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5.2 


Blocking  probability  found  in  SC  simulations  compared  to 
the  exact  value,  p  =  4. 
Ordinal  Ranking  of  Policies  in  Terms  of  Voice-Call 
Blocking  Probability 

In  Fig.  4  we  compare  ordinal  rankings  of  the  voice-call 
blocking  probability  obtained  from  two  SC  simulations  to  the 
exact  rankings.  The  two  SC  simulations  differed  only  in  their 
durations,  which  were  based  on  104  and  106  voice-call  arrivals. 
The  ordinal  ranking  of  the  simulation  results  shows 


177 


remarkable  agreement  with  the  exact  ordinal  ranking  (ideally 
the  curve  would  be  a  straight  line  with  unit  slope).  For  long 
simulations  this  is  not  surprising;  however,  it  is  remarkable 
that  the  short  simulation  placed  eight  of  the  top  ten  policies  in 
the  top  ten  positions.  This  agreement  is  achieved  despite  the 
insensitivity  of  blocking  probability  to  the  policy  used.  It  is 
apparent  from  Fig.  3  that  there  is  little  sensitivity  to  the  policy 
that  is  used  when  blocking  probability  is  the  only  performance 
measure  of  interest.  For  example,  as  the  policies  are  examined 
from  the  best  to  the  80th  out  of  120,  the  blocking  probability 
increases  by  only  a  small  amount,  i.e.,  from  0.358  to  0.365. 

5.3  The  Impact  of  Common  Event  Sequences  on  Rankings 
The  SC  method  uses  a  common  event  sequence  for  all 

control  policies;  thus  it  permits  the  direct  comparison  of  the 
performance  of  these  policies  under  the  same  input  conditions. 
The  benefits  of  the  use  of  common  event  sequences  for 
comparing  policies,  as  compared  to  the  use  of  independent 
brute  force  (BF)  simulations,  has  been  noted  in  [14,  16,  17, 
18].  It  was  shown  in  [19,  20]  that  the  use  of  different  random- 
number  generator  seeds  can  produce  differences  in 
performance  in  our  problem  that  are  considerably  greater  than 
those  resulting  from  the  use  of  different  admission-control 
policies,  especially  for  short  simulation  runs. 

The  failure  of  the  independent  BF  simulations  to  provide 
good  rankings  for  short  simulation  runs  clearly  demonstrates 
the  benefits  obtained  by  using  a  common  event  sequence.  In 
our  example,  the  use  of  independent  event  streams  appears  to 
require  a  simulation  duration  of  about  two  orders  of  magnitude 
greater  than  that  required  for  SC  simulation  to  obtain 
comparable  performance.  Thus,  in  addition  to  the  benefit  of 
simulation  speedup  (in  terms  of  the  time  required  to  simulate  a 
specified  number  of  events),  the  SC  technique's  use  of  a 
common  event  sequence  introduces  correlation  into 
experiments  so  that  accurate  rankings  are  obtained  early  in  the 
simulation.  Therefore,  highly  accurate  ordinal  rankings  can  be 
obtained  with  much  shorter  simulations  than  would  be 
required  with  independent  BF  simulation  runs. 

5.4  The  Use  of  Common  Random  Numbers,  but  Different 
Event  Sequences 

In  Section  4  we  noted  that  the  use  of  custom  alias  tables 
(custom  ratio  yardsticks)  may,  in  some  cases,  permit  higher 
efficiency  in  SC  simulations  because  of  the  consequent 
reduction  in  the  number  of  fictitious  events.  However,  when 
custom  alias  tables  are  used,  different  events  may  be  passed  to 
different  sample  paths,  even  though  the  same  random-number 
sequence  is  used  to  drive  all  of  the  parallel  experiments.  We 
now  address  the  quality  of  the  ordinal  rankings  that  are 
obtained  when  common  random  numbers  (CRN)  are  used  in 
conjunction  with  custom  alias  tables  to  simulate  the  same 
example  considered  above  for  SC  simulation  with  a  common 
event  stream  and  for  independent  BF  simulations.  In 
particular,  we  address  the  conjecture  that  the  use  of  a  common 
underlying  random-number  sequence  can  introduce  beneficial 
correlation  among  the  sample  paths,  even  though  the  event 
sequences  are  different.  It  was  recently  demonstrated  that  the 
use  of  CRN  maximizes  the  rate  of  convergence  for  ordinal 
comparison  in  systems  that  satisfy  "positive  quadrant 
dependence"3  [21];  however,  this  condition  is  not  satisfied  in 
our  problem. 

Figure  5  shows  the  blocking  probability  and  ordinal 


rankings  for  simulations  of  length  106  events  for  the  present 
case  of  CRN  simulations;  results  for  six  sample  paths  are 
shown.  A  comparison  with  curves  for  the  case  of  independent 
experiments  [19,  20]  shows  that  little,  if  any,  benefit  is 
achieved  by  using  the  CRN  method.  The  rankings  appear  to 
be  comparable  to  those  shown  in  Fig.  5  for  the  case  of  a 
simulation  of  duration  104  events. 
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Fig.  5  -  Ordinal  rankings  for  CRN  simulations  with  custom  alias 
tables;  six  random-number  generator  seeds;  simulation  duration  =  106 

events. 

5.5  A  Performance  Measure  for  Ordinal  Rankings:  The 
Spearman  Rank  Correlation  Coefficient 

To  facilitate  the  comparison  of  sets  of  rankings,  it  is 
helpful  to  use  a  quantitative  measure  of  the  quality  of  a  set  of 
rankings.  To  provide  the  basis  for  such  a  metric,  let  H,  be  the 
performance  measure  value  (in  this  case  blocking  probability) 
associated  with  policy  i  (1  <  i  <  N),  based  on  a  simulation  run; 
and  let  *P,  be  the  exact  performance  measure  value  associated 
with  policy  i  (where  the  index  i  is  arbitrary,  i.e.,  not  based  on 
performance  measure  values).  Thus  the  pair  (E„  *P ,) 
represents  the  simulated  and  exact  performance  values 
associated  with  policy  i.  Now  define  /?,  =  rank  H,  (where  /?,  = 
1  means  that  policy  i  is  the  policy  with  the  best  value  of 
and  S,  =  rank  *F,-.  One  method  of  comparing  these  sets  of 
rankings  is  the  use  of  Spearman's  rank  correlation  coefficient 
[22] 


rr  =  1  l—z  , 


3  Two  random  variables  X  €  R  and  Y  e  R  are  said  to  be  positively 
quadrant  dependent  if  Prob[X>x,  Y> y]  >  Prob[X>;c]  Prob[r>y]. 
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which  provides  a  scalar  measure  of  the  "association"  between 
two  sets  of  rankings.  If  all  of  the  simulated  rankings  are 
correct  (i.e.,  /?,  =  S„  1  <  i  <  N),  rs  =  1. 

Figure  6  shows  the  Spearman  rank  curves  for  the  three 
approaches  to  simulation  we  have  used,  namely  the  common- 
event,  independent-event  (BF),  and  CRN  examples.  Six 
sample  paths  are  shown  for  each  case  to  demonstrate  the 
degree  of  variation  among  of  the  results.  (The  Spearman 
rankings  for  the  CRN  examples  show  much  greater  variation 
among  the  different  random  seeds  than  those  for  the  common- 
event  and  independent-BF  examples.)  The  use  of  custom  alias 
tables  in  SC  simulations  is  detrimental,  apparently  because  the 
differences  introduced  into  the  experiments  reduces  the 
correlation  among  them  (as  compared  to  the  case  of  common- 
event  SC  simulation),  thus  resulting  in  poor  ordinal  rankings 
of  policies.  To  obtain  comparable  ordinal  rankings,  it  may  be 
necessary  to  run  simulations  that  are  several  orders  of 
magnitude  greater  in  length  than  those  required  for  common- 
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event  SC  simulations.  In  that  case,  however,  all  of  the 
efficiency  gained  through  the  reduction  of  fictitious  events 
would  be  eliminated. 
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Fig.  6  -  Quality  of  ordinal  rankings  of  blocking  probability,  based  on 
Spearman's  rank  coefficient,  for  SC,  BF,  and  CRN  simulations  of 
various  lengths. 

6    ORDINAL  OPTIMIZATION  USING 
CRUDE  ANALYTICAL  MODELS  AND 
IMPRECISE  SIMULATION  MODELS 

In  this  section,  we  demonstrate  the  use  of  crude  analytical 
models  and  imprecise  simulation  models  for  ordinal 
optimization.  The  particular  problem  we  address  is  the 
determination  of  the  voice-call  admission-control  policy  that 
minimizes  the  delay  of  data  packets  in  integrated  voice/data 
networks,  subject  to  a  constraint  on  the  blocking  probability  of 
voice  calls.  Data  traffic,  which  consists  of  fixed-length 
packets  that  arrive  according  to  exponential  interarrival  times, 
is  added  to  the  basic  model  of  Fig.  1 .  The  integrated  network 
model  is  described  fully  in  [7,  20],  where  it  is  demonstrated 
how  fixed-length  packets  can  be  incorporated  into  the  SC 
methodology. 

The  development  of  accurate  data-packet  delay  models  is 
a  difficult  problem.  Data-packet  delay  depends  not  only  on 
the  statistics  of  the  data-packet  process  itself,  but  also  on  the 
time-varying  properties  of  the  circuit-switched  voice-call 
process.  In  particular,  a  data  packet  can  be  successfully 
transmitted  only  if  a  transceiver  is  available  at  the  designated 
receiving  node  when  the  transmitting  node  transmits.  The 
availability  of  a  transceiver  for  reception  depends  on  the 
current  state  of  the  voice-call  process  at  that  node  because  data 
traffic  is  permitted  to  use  only  those  transceivers  that  are  not 
currently  supporting  an  active  voice  call.  Nevertheless,  we 
make  the  oversimplification  of  modeling  the  data-packet 
process  at  node  i  as  an  M/D/l  queueing  system,  for  which  the 
delay  (under  admission-control  policy  Q)  is 


where  C,  is  the  "residual  capacity"  of  node  i  (the  expected 
number  of  transceivers  available  for  data,  i.e.,  not  being  used 
for  voice  traffic),  pf  is  the  expected  normalized  load  at  node  i, 
and  ji  is  set  equal  to  1  without  loss  of  generality  (i.e., 
implicitly,  we  assume  that  delay  is  measured  in  slots,  where  a 
slot  is  the  time  required  by  a  transmitter  to  transmit  one  fixed- 
length  data  packet  (all  transceivers  transmit  at  the  same  rate)). 
In  the  subsequent  discussions,  the  delay  metric  we  have  used 


is  the  average  nodal  delay  taken  over  all  nodes. 

This  delay  model  is  deficient  in  several  ways.  In  addition 
to  the  fact  that  it  ignores  the  need  to  pair  transceivers  at  the 
transmitting  and  receiving  nodes,  it  also  ignores  other  critical 
aspects  of  network  operation.  Most  significant  is  the  implicit 
assumption  that  the  number  of  data  packets  that  are  serviced 
per  unit  time  at  node  i  is  constant  at  C;  (the  fact  that  C;  is  not, 
in  general,  integer-valued  is  a  minor  point  here),  whereas  the 
number  of  servers  available  in  the  real  system  varies,  based  on 
the  voice  state.  An  accurate  estimate  of  delay  would  have  to 
take  into  account  not  only  the  expected  voice  state  (which 
determines  residual  capacity)  as  we  do  in  our  model,  but  also 
the  fraction  of  time  spent  in  each  voice  state  as  well  as  the 
statistics  of  the  duration  of  each  state  visit.  Thus,  not 
surprisingly,  the  delay  estimate  based  on  this  model  is  not 
accurate.  However,  we  demonstrate  shortly  that  this  estimate 
does  provide  a  remarkably  accurate  indication  of  the  relative 
performance  of  a  large  number  of  different  policies  (and  thus 
of  their  rankings),  as  demonstrated  by  SC  simulations. 

In  Fig.  7  we  compare  the  ordinal  rankings  obtained  from 
four  of  our  SC  simulations  of  the  integrated  network  to  the 
ordinal  rankings  that  were  obtained  by  using  the  simple 
analytical  M/D/l  model  for  delay.  We  again  consider  the 
network  of  Fig.  2  with  eight  transceivers  at  each  node,  and 
examine  the  same  120  policies  as  in  the  voice-call  blocking 
probability  example;  however,  we  plot  our  results  only  for  the 
best  87  policies  because  the  estimated  delay  based  on  the 
analytical  M/D/l  model  is  infinite  for  the  remaining  policies 
(the  system  is  in  saturation  because  the  offered  load  is  greater 
than  the  residual  capacity  at  one  or  more  nodes).  Two  of  the 
simulations  used  the  "receivers-assumed"  model  [7,  20], 
which  neglects  the  need  to  verify  that  a  transceiver  is  actually 
available  at  the  intended  receiving  node.  The  other  two  used 
the  more-accurate  "receivers-verified"  model,  under  which  a 
data  packet  is  not  transmitted  unless  a  transceiver  is,  in  fact, 
available  at  the  intended  receiving  node.  In  all  cases,  the 
duration  of  the  simulation  was  106  voice  arrivals. 
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Fig.  7  —  A  comparison  of  the  ranking  of  the  policies  by  SC 
simulation  and  by  the  analytical  delay  estimate. 

For  all  four  runs  shown,  pj  =  pv  =  4  (j  =  1,2,  5)  and 
the  loading  on  each  of  ten  data  queues  is  pd  =  0.5.  For  each 
model  (i.e.,  receivers-assumed  and  receivers-verified)  we  used 
two  different  expected  voice-  call  durations  (A,v=  4.0,  |iv=  1.0) 
and  (Xv=  0.4,  U7  =  0.1).  The  data  service  rate  \ad  was  10.0  in 
all  the  simulations,  thus,  the  expected  voice-call  duration  is  10 
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times  the  data-packet  length  when  u/  =  1.0,  and  100  times  the 
data-packet  length  when  [iv=0.l.  Therefore,  the  total  offered 
voice  and  data  loads  are  the  same  in  these  simulations,  but  the 
parameter  sets  differ  in  the  voice  rates  and  \xv.  This  is  of 
interest  because  our  analytical  delay  estimate,  which  is  based 
on  the  product-form  solution  of  the  voice  process,  only 
considers  the  offered  voice  load  (i.e.,  pj).  It  does  not  account 
for  the  actual  values  of  AXand  \\Y.  By  keeping  the  offered  load 
the  same,  and  varying  the  voice  rate  (i.e.,  vary  A,v  and  (iv 
subject  to  Ay/(iv  =  4),  we  can  evaluate  the  impact  of  this 
simplification  in  the  analytical  delay  estimate. 

The  y  axis  represents  the  rank  assigned  to  the  policy  (x 
axis)  as  a  result  of  SC  simulations.  Again,  a  straight  line  with 
unit  slope  would  indicate  perfect  agreement.  The  figure 
shows  that  the  agreement  for  all  four  simulations,  although 
imperfect,  is  impressively  good;  the  four  curves  are  virtually 
indistinguishable,  and  therefore  they  are  not  labeled.  The 
deviation  in  ranking  among  the  top  54  policies  is  never  more 
than  three.  Furthermore  it  is  interesting  that,  despite  the 
significant  differences  between  the  simulation  models  (data 
receivers  assumed  vs.  data  receivers  verified)  and  the  voice 
rates  (kv  =  4  vs.  =  0.4),  which  produce  significantly 
different  values  of  delay  (the  difference  can  be  several  orders 
of  magnitude  [7,  20]),  the  rankings  are  very  similar  among  the 
different  simulation  runs.  This  is  true  even  though,  for  the 
case  of  the  receivers-verified  model,  all  but  the  top  four 
policies  represent  operation  in  the  saturation  region.  Thus,  we 
see  that  both  the  crude  analytical  model  (i.e.,  the  M/D/l 
model)  and  the  imprecise  simulation  model  (the  receivers- 
assumed  model)  produce  highly  accurate  ordinal  rankings, 
even  though  they  provide  poor  estimates  of  actual 
performance. 

7.  CONCLUSIONS 

A  variety  of  approaches  are  available  for  ordinal 
optimization.  Most  of  our  work  has  been  based  on  the  use  of 
the  Standard  Clock  approach,  which  provides  an  efficient 
means  to  simulate  structurally  similar,  but  parametrically 
different,  systems  in  parallel.  With  this  technique,  the  ability 
to  rapidly  determine  good  ordinal  rankings  is  a  consequence  of 
the  use  of  a  common  event  sequence  to  drive  all  of  the  parallel 
simulations.  This  common  event  sequence  introduces  a  high 
level  of  correlation  among  the  parallel  experiments.  In  this 
paper  we  compared  SC  techniques  to  the  use  of  the  method  of 
Common  Random  Numbers  (CRN),  which  is  implemented  by 
using  custom  ratio  yardsticks  (and  hence  custom  alias  tables). 
We  have  observed  that  the  use  of  the  CRN  approach  provides 
little  improvement  over  the  use  of  independent  brute  force 
simulations,  apparently  because  there  is  little  correlation 
among  the  experiments. 

We  also  demonstrated  that  crude  analytical  models  and 
imprecise  simulation  models,  which  are  incapable  of 
providing  acceptable  estimates  of  system  performance, 
nevertheless  can  be  useful  tools  for  ordinal  optimization. 
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ABSTRACT 

In  this  talk,  we  explain  the  advantages  of  selection-based  search 
schemes  from  a  theoretical  perspective.  The  role  of  selection  in 
tackling  many  difficult  engineering  problems  has  been  empirically 
demonstrated  with  noticeable  successes.  This  is  also  confirmed  in 
the  recent  AI  researches.  We  show  that  one  can  achieve  impressive 
gain  in  computational  efficiency  when  certain  strategic  definition  of 
optimization  goal  is  adopted.  In  tliis  light,  we  propose  a 
generalized  optimization  paradigm  called  ordinal  optimization, 
which  pertains  to  performance  analysis  of  many  computationally 
hard  problems  in  a  pragmatic  sense. 

KEYWORDS;  stochastic  optimization,  sampling  and 
selection,  goal  softening,  learning  algorithms 


1.  INTRODUCTION 

The  following  optimization  objective  is  often  sought  in  various 
design  problems: 

mine0J(0)  =  E[L(0)]  (1) 

where  ©  is  the  design  space  in  which  0  is  a  design  parameter, 
and  JO)  is  an  expected  value  of  performance  measurements 
L(0).  While  mathematically  succinct,  equation  (1)  is  difficult 
to  solve  in  many  real-world  design  scenarios,  e.g.  intelligent 
systems,  owing  to  the  following  challenges: 

A.  Search  —  the  design  space  ©  is  huge  and  may  have  little 
structure  to  be  exploited.  Typical  examples  are  problems 
in  combinatorial  optimization,  which  involves  discrete  or 
even  symbolic  variables.  On  top  of  the  exponential 
growth  in  the  size  of  search  space,  local  properties  such  as 
gradient  and  convexity  information  may  not  be  easily 
constructed. 


R.  Estimation  —  the  functional  JO)  has  to  be  calculated  via 
monte-carlo  simulation  or  through  crude  approximating 
models.  Lengthy  and  repeated  computer  experiments  for 
averaging  out  1/0)' s  are  needed  in  order  to  obtain  quality 
answers.  Theoretical  limit  on  how  fast  statistical  estimates 
can  converge  also  makes  simulation  by  itself  an  arguably 
insufficient  means  to  solve  statement  (1)  in  view  of  limited 
computing  budget. 

Despite  the  above  challenges,  problems  such  as  (1)  are  still 
faced  by  designers  who  are  eager  to  obtain  quality  answers  in  a 
time-efficient  and  computationally  affordable  manner.  Ordinal 
optimization,  proposed  by  Ho  et  al.  [1]  (see  also  Ho  [15]),  aims 
at  getting  around  the  above  mentioned  challenges.  In  fact, 
ordinal  optimization  is  a  complementary  technique  which 
enhances  the  capabilities  offered  by  existing  search  methods 
and  estimation  algorithms  We  shall  elaborate  ordinal 
optimization  in  Section  2. 

Sampling  and  search  methods  have  recently  gained 
attention  in  AI  researches.  Successes  were  reported  in  chess 
tournament,  planning  and  scheduling  applications  [2],  in 
control  problems  [3],  and  in  the  networking  area  [4]. 
Intelligent  search  schemes  combined  with  the  advent  of  faster 
and  parallel  computing  technology  provide  a  new  avenue  of 
problem  solving  possibilities.  A  pertinent  idea  is  to  iteratively 
learn  about  good  solution  ''properties"  by  means  of  sampling 
the  search  space.  These  "properties"  can  be,  for  example, 
useful  solution  representation,  promising  regions  of  the  search 
space,  or  better  heuristics,  etc.  Genetic  algorithms  and 
reinforcement  learning  are  techniques  that  embrace  these  ideas 
of  iterative  sampling  and  learning. 

Meanwhile,  due  to  the  imprecision  or  randomness 
involved  in  estimating  design  performances  (e.g.,  in  monte- 
carlo  experiments),  it  becomes  necessary  to  devise  selection 
methods  so  that  learning  can  be  effectively  achieved  through 


i  This  work  is  partially  supported  by  NSF  grants  EID-9212122,  EEC9402384,  Army  contracts  DAAL03-92-GO1 15,  DAAH04-95- 
0148  and  Air  Force  contract  F49620-95-1-0131. 

1  Post-doctoral  fellow.  Division  of  Engineering  and  Applied  Sciences,  Harvard  University.  Email,  twel@arcadia.harvard.edu. 


181 


evaluating  the  design  samples.  This  motivates  our  main  topic 
in  this  paper  —  selection  schemes  in  ordinal  optimization, 
which  will  be  discussed  in  Section  3.  Subsequently,  we  will 
connect  the  selection  schemes  to  some  ideas  of  iterative 
learning  in  Section  4. 

2.  GENERALIZED  SELECTION  AND  GOALS 

Consider  a  collection  of  N  design  samples  drawn  from  the 
design  space.  There  is  a  true  order  of  these  N  designs  with 
respect  to  the  true  performances.  If  this  order  were  known, 
then  we  could  easily  pick  out  the  best,  the  second  best,  etc., 
and  the  selection  problem  is  essentially  solved.  However,  the 
observed  order  of  these  N  designs  is  inevitably  perturbed  from 
the  true  order  due  to  noisy  evaluation  of  design  performances. 
In  particular,  suppose  the  designer  insists  on  getting  one 
design  which  is  the  true  best.  Then  lengthy  evaluation  is 
needed  so  that  the  resulting  order  of  the  TV  designs  would  rum 
out  the  true  best  design  showing  up  as  the  observed  best.  If 
the  noise  factor  is  large,  then  it  is  possible  that  the  true  best  is 
displaced  to  the  last  position  of  the  observed  order  of  the  N 
designs.  Therefore,  we  bring  forth  two  major  ideas  in  ordinal 
optimization: 

•  Ordinal  comparison  —  it  is  much  easier  to  decide  whether 
"A  >  5?"  than  "A  -  B  =  ?"  In  other  words,  one  can 
estimate  the  observed  performance  order  better  than  the 
magnitude  of  performance  values.  In  fact,  this  has  been 
formally  established  by  Dai  [5]  and  Xie  [6],  and  extended 
by  Dai  and  Chen  [14].  Their  results  show  that  the 
probability  of  answering  the  former  question  correctly 
occurs  exponentially  fast  along  with  the  simulation 
horizon. 

•  Goal  softening  —  instead  of  picking  a  single  design  to 
match  the  very  best  in  a  population,  which  is  an  unlikely 
event,  one  may  soften  the  goal  to  settle  on  the  top-«%  of 
the  population  by  selecting  a  subset  of  designs.  Lee  et  al. 
[7]  has  studied  this  viewpoint  using  an  order  statistics 
formulation,  and  show  that  it  also  has  exponential 
advantage  in  terms  of  correct  selection. 

These  two  tenets  are  manifested  by  a  quantity  called 
alignment  probability-  defined  as 

Pfos,k)=P(\Gr&\  zk,t)  (2) 

where  G  is  the  set  of  top-g  true  best  designs  (called  good 
enough  designs),  and  S  is  the  set  of  top-s  observed  best 
designs,  k  is  called  the  alignment  level  and  /  is  the  index  for 
simulation  horizon.  Notice  that  in  conventional  optimization 
one  often  asks  for  g  =  s  -  1 .  The  alignment  probability  of  such 
a  case  is  often  too  small  to  be  of  any  interest.  On  the  other 


hand,  Dai  [5]  has  shown  that  1-P/(1,1,1),  decays  exponentially 
as  the  simulation  proceeds,  i.e.,  for  some  a  >  0  (which 
depends  on  the  N  design  performances),  we  have 

<  exp(-a/). 

Xie  [6]  has  generalized  the  result  for  other  values  of  g,  s  and  k. 

Meanwhile,  in  Lee  et  al.  [7],  it  is  shown  that  \-Pt(g,s,\) 
decreases  exponentially  as  the  sizes  g  and  s  are  relaxed,  i.e., 

1-PKg,  s,l)  z  exp(-fcn(gyy,0)- 

The  implication  of  these  results  is  that  if  one  is  willing 
to  relax  the  goal  and  select  a  subset  of  design  for  further 
analysis  or  learning,  then  it  is  possible  to 

i)  execute  shorter  simulation  to  tolerate  large  confidence 
intervals  of  performance  estimates;  and/or 

ii)  run  crude  or  surrogate  model  that  is  computationally 
cheaper  and  more  amenable  to  analysis. 

These  advantages  translate  into  impressive  computational 
saving  and  speed-up,  and  have  been  confirmed  by  various 
empirical  studies  (see  [4]  and  [8]  to  [12]). 

3.  EXPECTED  ALIGNMENT  LEVELS 

In  this  section,  we  look  closer  into  the  effects  of  goal  softening. 
Given  that  the  designer  is  willing  to  accept  a  good  enough 
subset  G  and  to  select  a  subset  S  from  short  simulation  or 
crude  evaluation,  a  useful  quantity  to  assist  decision  is  the 
expected  alignment  level  Et[k]  given  by 

Et[k]  =  T,k  kP(  I  GnS  |  =k;f)  (3) 

It  is  not  difficult  to  see  that  the  expected  alignment  level  is  an 
increasing  fiinction  of  g  and  s.    In  the  absence  of  true 

performances,  one  has  to  estimate  Et[k]  based  on  noisy 
observed  performances.  However,  we  shall  demonstrate  that 
this  can  be  approximately  obtained  if  the  designer  is  able  to 
estimate  the  type  of  underlying  true  performance  profile  and 
noise  magnitude.  The  true  performance  profile  will  be  referred 
to  as  the  ordered  performance  curve  (OPC).  It  is  a  non- 
decreasing  curve  of  true  performances  plotted  against  the 
integral  values  of  the  true  performance  order.  In  general  there 
are  infinitely  many  possible  form  of  OPCs  but  several  types  of 
OPC  can  be  categorized,  which  are  shown  in  Figure  1  below. 
See  Lau  and  Ho  [13]  for  detailed  discussion  regarding  ordered 
performance  curves. 
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Ordered  Designs 

Figure  1.  Different  types  of  OPCs 

For  illustrative  purpose,  let  us  consider  the  case  that 
all  true  performances  are  equally  spaced  apart,  which  is 
equivalent  to  a  linear  OPC  of  the  form 

•/(%])  =  <*<%)  +  co  (4) 

where      is  the  ith  true  best  design,  and       gives  the  true 

rank  of  a  design.  Notice  that  />(%])  =  »■  Clearly  this 
relationship  would  not  be  available,  for  otherwise  there  would 
be  no  problem  for  selection  in  the  first  place.  Consider  that  the 
observed  performance  values  are  generated  by  the 

additive  noise  model 

J'(%])  =  -W+<»(%i)  (4) 

where  <!>(%])  is  me  noise  disturbance  which  we  assume  to  be 
independent  across  designs.  Typical  picture  of  observed 
performances  is  shown  in  Figure  2. 
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True  Performance  Order 


Figure  2.  Design  order  is  perturbed  by  noisy  observations 

Suppose  the  designer  is  able  to  give  a  rough  guess  that  the 
underlying  true  performances  are  approximately  uniformly 
separated  (by  means  of  short  simulation,  for  example).  Then, 


the  expected  alignment  levels  can  be  shown  as  in  Figure  3a,  3b 
and  3c  for  signal-to-noise  ratios  1:5,  1:2  and  1:1  respectively.1 
These  diagrams  are  generated  based  on  a  total  of  1,000  design 
samples  with  linear  OPC  and  independent  uniform  noise 
density  \-WJY\. 
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Figure  3a.  E[k]  vs.  g  for  s  =  10  to  100  with  linear  OPC  and 
signal-to-noise  ratio  1:5. 
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Figure  3b.  E\k]  vs.  g  for  s  =  10  to  100  with  linear  OPC  and 
signal-to-noise  ratio  1:2. 
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Figure  3c.  E[k]  vs.  g  for  5  =  1 0  to  100  with  linear  OPC  and 
signal-to-noise  ratio  1:1. 

1  Signal-to-noise  ratio  is  defined  as  the  ratio  between  the 
range  of  true  performances  to  the  range  of  noise  values. 
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Let  us  interpret  these  diagrams.  In  the  presence  of 
very  large  noise,  such  as  1:5,  if  one  is  willing  to  relax  the  goal 
to  g  =  50  designs,  then  one  can  expect  about  5.5  designs  in  a 
selection  of  s  =  50  designs.  If  more  simulation  is  performed 
and  signal-to-noise  ratio  is  reduced  to  1.2,  then  on  average 
about  10  .5  of  the  top-50  true  best  designs  are  found  in  the  top- 
50  observed  best  designs.  Notice  that  should  the  designer 
insist  on  getting  g  =  s  =  1,  then  much  longer  simulation  is 
needed  in  order  to  get  the  selected  best  to  be  indeed  the  tme 
best.  The  computational  advantage  here  is  several  folded. 


4.  DISCUSSIONS 

We  have  seen  in  the  previous  section  that  softening  the  goal  of 
selection  provides  practical  savings  in  the  simulation  budget. 
The  selection  scheme  is  based  on  observed  performances  and 
thus  the  name  Horse  Race  selection  rule  is  coined.  Other 
selection  rules  are  also  possible  and  generally  produce 
different  alignment  results.  The  study  of  various  selection 
rules  is  an  on-going  research  topic.  Besides  the  above 
illustrative  results,  we  also  have  extensive  calculations  based 
on  a  variety  of  types  of  OPC  and  noise  ranges,  also  reported  in 
Lau  and  Ho  [13]. 

From  an  algorithmic  point  of  view,  with  a  selected 
subset  that  guarantees  certain  alignment  of  good  enough 
designs,  a  designer  can  run  longer  simulation  on  this  subset  to 
learn  about  good  "properties"  of  solutions,  as  mentioned  in 
Section  1 .  The  subsequent  steps  would  be  to  develop  adaptive 
search  schemes  that  utilize  as  much  as  possible  the  knowledge 
learned  from  this  selected  subset.  This  sampling-selection- 
learning  approach  laid  out  by  ordinal  optimization  carries 
important  implications  in  the  design  of  intelligent  systems. 

Furthermore,  ordinal  optimization  advocates  a  new 
paradigm  to  perform  stochastic  optimization  for 
computationally  difficult  problems;  namely,  ordinal 
comparison  and  goal  softening.  It  serves  as  a  complementary 
methodology  to  existing  approaches  such  as  genetic 
algorithms,  reinforcement  learning,  other  artificial  intelligence 
techniques,  and  nonetheless  conventional  procedures 
discussed  in  many  optimization  literature.  We  believe  that  this 
should  warrant  further  research  attention.  In  the  theoretical 
development  of  ordinal  optimization,  we  have  current  results 
that  tackle  noise  estimation,  design-dependent  noises  and 
constrained  optimization  problems. 
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ABSTRACT 

Simulation  plays  a  vital  role  in  analyzing  many  discrete-event 
systems.  Usually,  using  simulation  to  solve  such  problems 
can  be  both  expensive  and  time  consuming.  We  present  an 
effective  approach  to  intelligently  allocate  computing  budget 
for  discrete-event  simulation.  This  approach  can  intelligently 
determine  the  best  simulation  lengths  for  all  simulation 
experiments  and  significantly  reduce  the  total  computation  cost 
to  obtain  the  same  confidence  level.  Numerical  testing  results 
are  included.  Also  we  compare  our  approach  with  traditional 
two-stage  procedures.  Numerical  results  show  that  our 
approach  is  much  faster  than  the  traditional  two-stage 
procedures. 

KEYWORDS:  Discrete-event  simulation,  optimization, 
ranking  and  selection. 

1 .  WHY  INTELLIGENT  COMPUTING 
BUDGET  ALLOCATION? 

It  is  often  necessary  to  apply  extensive  simulation  to  design  or 
efficiently  manage  large  real-life  systems  such  as 
communication   networks,   traffic  systems,   and  automated 
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manufacturing  plants.  Unfortunately,  simulation  can  be  both 
expensive  and  time  consuming.  Suppose  we  want  to  compare 
k  different  discrete-event  systems  (designs  or  alternatives),  we 
do  N  simulation  replications  for  all  k  designs.  (Without  loss 
of  generality,  we  consider  terminating  simulation  in  this 
paper.  Our  approach  is  also  applicable  to  steady-state 
simulation  although  we  need  to  have  N  independent  samples 
rather  than  N  independent  simulation  replications.)  Therefore, 
we  need  kN  simulation  replications.  The  simulation  results 
become  more  accurate  when  N  increases.  If  the  accuracy 
requirement  is  high  (N  is  not  small),  and  if  the  total  number  of 
designs  in  a  design  problem  is  not  small  (k  is  large),  then  kN 
can  be  very  large,  which  may  easily  make  total  simulation 
cost  extremely  high  and  preclude  the  feasibility  of  the 
simulation  approach.  How  to  effectively  reduce  computation 
cost  to  obtain  a  good  decision  is  crucial  to  apply  simulation. 

Instead  of  equally  simulating  all  k  designs,  a  more 
efficient  way  is  to  have  different  simulation  replication 
numbers  for  different  designs.  The  numbers  of  replications  are 
determined  using  preliminary  simulation  information.  A 
central  issue  is  how  to  effectively  utilize  the  available 
information  and  how  to  intelligently  determine  the  best 
simulation  replication  numbers  for  al  designs.  Our  goal  is  to 
significantly  reduce  simulation  cost  while  obtaining  the  desired 
simulation  quality.  In  fact,  this  question  is  equivalent  to 
optimally  decide  which  designs  will  receive  computing  budget 
for  continuing  simulation  or  to  find  an  optimal  way  to  reach 
an  optimal  design. 

Dudewiczand  Dalai  (1975)  develop  a  two-stage  procedure 
for  selecting  the  best  design  or  a  design  which  is  very  close  to 
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the  best  system.  In  the  first  stage,  all  systems  are  simulated 
with  n0  replications.  Based  on  the  results  obtained  from  the 

first  stage,  we  estimate  how  many  more  simulation 
replications  for  each  design  should  be  conducted  in  the  second 
stage  in  order  to  reach  the  desired  confidence  level.  Rinott 
(1978)  presents  another  way  to  estimate  the  number  of  required 
simulation  replications  in  the  second  stage.  Many  researchers 
have  extended  this  idea  to  more  general  ranking  and  selection 
problems  in  conjunction  with  new  developments.  Among  of 
them  are  Chiu  (1974),  Gupta  and  Panchapakesan  (1979), 
Charnes  (1991),  Matejcik  and  Nelson  (1993),  Bechhofer, 
Santner,  and  Goldsman  (1995),  Futschik  and  Pflug  (1996),  and 
Hsu  (1996). 

With  further  development  from  Chen  (1995),  we  will 
present  an  effective  approach  to  intelligent  allocate  computing 
resource  for  discrete-event  simulation.  Also,  we  will  compare 
our  approach  with  the  traditional  two-stage  procedures  by 
conducting  a  numerical  experiment.  Numerical  results  show 
that  our  approach  is  more  than  ten  times  faster  than  the  two- 
stage  procedures.  We  formulate  the  problem  of  intelligent 
computing  resource  allocation  as  a  "optimal  computing  budget 
allocation"  problem  in  the  next  Section.  In  Section  3,  we 
present  a  simple  sequential  approach.  We  will  show  the 
numerical  results  in  Section  4.  Section  5  compares  our 
method  with  the  two-stage  procedure.  Section  6  concludes  this 
paper. 


2 .  OPTIMAL  COMPUTING  BUDGET 
ALLOCATION 

Suppose  that  our  goal  is  to  select  a  design  associated  with  the 
smallest  mean  performance  measure  among  k  designs  with 
unknown  variances  that  are  not  necessarily  equal.  Further 
assume  that  the  computing  budget  is  limited  and  the  number 
of  designs  is  not  small. 

Denote 

k  :  the  total  number  of  designs, 

Xjj-.  the  j  th  i.Ld.  sample  of  the  performance  measure  from 
Design  i, 

Mj  :  the  number  of  simulation  replications  for  Design  i, 

1 

H^.  the  sample  mean  for  Design  i;  /i(  =  —     Xtj , 

Ni  j=\ 

jdj  :  the  mean  performance  measure;  fl(  =  E(Xtj ) . 

For  steady-state  simulation,  Batch  Means  method 
(Schmeiser  1982)  can  be  used  if  the  simulation  samples  from 
any  design  are  not  independent.  Again,  we  consider 
terminating  simulation  in  this  paper.  Independence 
assumption  is  not  a  problem  for  practical  applications. 


When  Nj 's  are  large,  Jli  can  be  a  good  approximation  for 
Hi,  since,  according  to  the  law  of  large  numbers,  — >  ^u,} 
— >  1,  as  Af,  — >  oo.  Given  the  fact  that  we  can  only  do  a  finite 
number  of  simulation  replications,  Jii  is  simply  an 
approximation  to  (J.^  Using  the  approximation  results  to 
select  the  best  design  (without  loss  of  generality,  we  consider 
minimization  problems  in  this  paper;  thus  the  "best"  design 
means  the  design  with  the  smallest  /J,),  we  have  to  ask  what 
the  probability  of  correct  selection  is.  Correct 
selection  can  be  defined  by  that  a  design  with  the  smallest 
sample  performance  measure  is  actually  the  best  design  (the 
smallest  population  performance).  In  the  remaining  part  of 
this  paper,  let  "CS"  denote  the  event  "correct  selection." 

Chen  (1996)  provides  an  effective  way  to  quantify 
confidence  level  when  the  number  of  systems  is  large. 
Furthermore,  the  sensitivity  information  of  the  confidence 
level  with  respect  to  simulation  replications  can  be  easily 
obtained  when  the  approach  in  Chen  (1996)  is  applied,  which 
will  provide  the  basis  to  determine  how  to  allocate  computing 
in  this  paper.  From  Chen  (1996)  we  have 

P{CS}  =  P{  a  system  with  the  smallest  sample  mean 
performance  is  actually  the  best  system} 

>  Y[P{fib<fii)  =APCS  (1) 

i=l 

where  index  b  is  the  design  having  the  smallest  sample  mean 
performance,  i.e., 

b  =  arg  min  {  /J, }, 

i 

and  fit  is  the  posterior  distribution  which  consists  of 
information  from  both  prior  distribution  and  the  samples  {X,-,, 
j  =  1,  2,  ..,  N, }.  Under  the  assumption  of  normality 

1  N'  a2 
M,~N(  — I  X9,-M,      fori  =1,2,..,*. 

ty;=l  Ni 

We  refer  to  this  lower  bound  of  the  correct  selection 
probability  as  the  Approximate  Probability  of  Correct 
Selection  (APCS).  While  P{CS}  is  very  difficult  to  obtain, 
APCS  can  be  computed  very  easily.  We  will  use  APCS  to 
approximate  f{CS}.  Numerical  testing  in  Chen  (1996)  shows 
that  it  can  provide  reasonably  good  approximation. 

As  motivated  in  Section  1,  we  intend  to  minimize  the 
total  computation  cost  while  obtaining  a  desired  confidence 
level.  If  simulations  are  performed  on  a  sequential  computer, 
the  computation  cost  can  be  approximated  by  Nl  +  N2  +  ■  ■  + 
Nk.  Ideally  we  want  to 
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min  {  Nj  +  N2  +  ■•  +  Nk  } 
s.t.  APCS  >  P*. 


(2) 


where  P*  is  a  user-defined  confidence  level  requirement. 


3.  A  SEQUENTIAL  APPROACH 

We  now  present  a  sequential  approach  to  intelligently 
determine  the  number  of  simulation  replications.  Instead  of 
finding  the  best  Nx,  N2,  ..,  Nk  at  the  beginning  of  simulation, 
we  sequentially  select  some  PROMISING  designs  and 
simulate  these  selected  PROMISING  designs.  The 
PROMISING  designs  are  those  by  simulating  which  we 
anticipate  the  improvement  of  APCS  is  maximized. 

Before  doing  the  simulation,  there  is  neither  knowledge 
about  APCS  nor  an  idea  about  how  to  allocate  budget, 
therefore  all  designs  are  simulated  with  n0  replications,  and  the 
posterior  distribution  for  design  i  is 


1  n°  o 
N(— I  Xij,^-) 

n0  7=1  n0 


We  use  this  statistical  information  to  decide  on  the  further 
allocation.  In  other  words,  after  running  n0  replications  for 
each  design,  we  have  the  basic  idea  about  each  design  and  can 
decide  which  designs  are  worthy  of  being  allocated  more 
computing  budget.  Let  A,  be  the  additional  computing  budget 
allocated  to  system  i  in  each  step  (A,  is  a  non-negative 
integer).  In  order  to  find  the  PROMISING  designs  and 
effectively  allocate  computing  budget  for  further  simulation,  it 
is  necessary  to  know  how  APCS  would  be  affected  if 
additional  simulation  budget  A,  is  added  to  system  i.  Under 
the  Bayesian  model,  it  is  convenient  to  use  the  statistical 
information  at  nt)  to  estimate  APCS  at  n0  +  A,  by  using  an 
approximated  posterior  distribution 


1  "o 

N(— I  Xtj, 

"0  7  =  1 


n0  +  A,- 


for  design  i. 


for  design  i. 


Let  A  =    X^i-     Thus  A   is   one-time  incremental 

computing  budget  in  our  sequential  algorithm.  In  Section  5  we 
have  more  details  about  how  to  choose  nQ  and  A.  We  hope  that 
APCS  becomes  larger  as  simulation  proceeds,  we  sequentially 
add  computing  budget  by  A  each  time  until  that  APCS 
achieves  a  satisfactory  level  P*.  In  order  to  reduce  to  the  total 

computation  cost,  this  budget  A  should  be  optimally  allocated 
so  that  the  EAPCS  is  maximized.  Thus,  at  step  /,  /  =  1,  2, 
we  have 


max t  EAPCS( N[  +  A\ ,  Nl2  +  A'2 ,  •  ■  • ,  Nlk  +  Alk ), 

A,  ,...,Ak 


k 

s.t.  £  A\  =  A  and  A'(  >  0  for  all  i. 


(3) 


In  summary,  we  have  the  following  algorithm: 


A  Sequential  Algorithm  for  Optimal  Computing 
Budget  Allocation  (OCBA) 


Step  0.   Perform  nQ  simulation  replications  for  all  designs, 
/  <r-  0, 

Nl=Nl2=-  =  Nlk=n0. 

Step  1.    If  APCS(  Ni,N2,---,Nlk)  >P\  stop,  otherwise, 
go  to  Step  2. 

Step  2.   Solve  (3), 

Step  3.    Perform  additional  A'  simulation  replications  for 
design  /',  i  =  1,  k.. 

Nj+]  =  Nj  +  A',,  for  i  =  1,  k.. 

I  <-  /+  1. 


We  refer  this  approximation  to  EAPCS  (estimated 
approximate  probability  of  correct  selection).  We  assume  that 
A,  is  not  large  and  nn  is  close  to  n0  +  A,,  otherwise  EAPCS  is 
not  a  good  estimator  for  APCS. 

Similarly,  when  simulation  proceeds  until  (Nu  N2,  ..,  N,_ 
i,  Nh  Ni+l,  ..,  Nk),  we  can  also  use  the  available  information  to 
estimate  how  APCS  will  change  if  design  i  is  given  additional 
budget  A„  i.e.,  EAPCS(/V„  N2,  ..,  tfM,  N,+A„  Ni+l,  ..,  Nk), 
by  using  an  approximated  posterior  distribution 


To  solve  (3),  we  assume  the  variables  are  continuous  and 
apply  steepest-descent  method  (Luenberger  1984  and  Nash  and 
Sofer  1996)  to  approximately  solve  (3).  Then  the  results  for 
the  numbers  of  simulation  replications  are  rounded  off  to 
integers.  The  gradient  of  EAPCS  with  respect  to  Af,  is 
estimated  by  the  following  formula: 
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EAPCS(N],N2,---,Ni  +  T,Ni+]  Nk)-APCS(N],N2,---,Ni,Ni+i  Nk) 

T 

(4) 

where  T  is  a  small  number.  To  avoid  spending  much  time  in 
iteratively  finding  the  solution  of  (3),  we  only  do  a  very 
limited  numbers  of  iterations  when  applying  steepest-descent 
method.  In  fact,  we  can  also  use  other  optimization  techniques 
to  solve  (3),  please  refer  to  Chen  et  al.  (1997)  for  more  details. 
Chen  et  al.  1997  also  gives  guidelines  for  selection  of  n{)  and 
A. 


4.  NUMERICAL  TESTING 

This  section  present  the  numerical  testing  results  using  our 
OCBA  algorithm.  We  test  a  simple  G/G/l/°°  queue  (k=W). 
There  is  one  server  with  uniformly  distributed  service  times 
and  interarrival  times.  In  this  single-node  example,  all  designs 
have  the  same  arrival  time  uniform[0.1,  1.9],  and  service  time 
in  system  i  is  uniform[0.1,  1.8-0.05  i],  i  =  1,  2,  10.  We 
want  to  find  a  design  with  minimum  average  system  time  for 
customers  served  in  the  first  10  time  units  (terminating 
simulation).  Obviously,  higher  service  rate  results  in  shorter 
system  time  in  this  simple  example,  therefore,  design  1  is  the 
true  best  design.  In  the  numerical  experiment,  we  compare 
their  computation  costs  and  the  actual  convergence 
probabilities  P{CS}  for  using  OCBA  and  without  using 
OCBA  approaches. 

We  set  A  =  12  and  n0  =  10  in  this  example.  To  avoid 
spending  too  much  time  in  solving  (3),  we  only  do  two 
iterations  in  the  gradient  method.  10,000  independent 
experiments  are  performed  so  that  the  average  computation 
cost  and  P{CS}  can  be  estimated.  Different  confidence  level 
requirements  are  also  tested.  Table  1  contains  the  numerical 
results  using  OCBA  algorithm  and  Table  2  contains  the  results 
without  using  OCBA. 


p* 

Total  #  of 
Replications 

P{CS} 

60% 

196.4 

72.2% 

80% 

344.3 

86.6% 

90% 

523.6 

96.3% 

95% 

735.4 

98.1% 

Table  1 .  Average  total  number  of  simulation  replications  and 
P{CS)  for  OCBA  application  (n0  =10). 


p* 

Total  #  of 
Replications 

PICS) 

60% 

541.8 

82.9% 

80% 

863.6 

90.7% 

90% 

1474.6 

96.9% 

95% 

2175.5 

99.0% 

Table  2.  Average  total  number  of  simulation  replications  and 
P{CS} without  applying  OCBA  algorithm  (nG  =  10). 


Comparing  Tables  1  and  2,  we  observe  that  our  OCBA 
can  achieve  the  desired  P{CS}  with  much  lower  computation 
cost.  Figure  1  shows  how  the  OCBA  algorithm  allocates 
computing  budget  to  different  designs. 

5  .  COMPARISON  WITH  OTHER 
METHODS 

In  this  section  we  will  compare  our  approach  with  two- 
stage  procedures  given  by  Dudewicz  and  Dalai  (1975)  and 
Rinott  (1977)  using  the  testing  example  in  Section  4.  Unlike 
our  Bayesian  approach,  these  two-stage  procedures  are 
developed  based  on  classic  statistical  model.  Also  the  idea  of 
indifference-zone  is  required  to  apply  such  two-stage 
procedures. 

Two  confidence  level  P*  =  90%  and  95%  have  been  tested. 
When  applying  Rinott' s  and  Dudewicz' s  procedures,  we  set  the 
indifference  zone  d*  =  0.059  (p.m  -  jtz(1)  =  0.059).  In  both  cases 
10,000  independent  experiments  are  run  to  evaluate  the 
computational  efficiency  and  to  estimate  the  actual 
convergence  probabilities.  The  computation  cost  and  P{CS} 
are  given  in  Table  3.  From  Table  3,  significant  speedup  is 
observed  for  our  method  over  both  two-stage  procedures,  while 
the  actual  convergence  probabilities  for  all  approaches  are  no 
less  than  the  desired  levels.  The  speedup  factors  are  higher 
than  14  in  all  cases. 


Methods 

Total  #  of 
Replications 

P{CS) 

Speedup 
factor  using 
OCBA 

OCBA 

523.6 

96.2% 

Dudewicz' s  Proc. 

8059.9 

98.3% 

15.3 

Rinott' s  Proc. 

9479.3 

98.6% 

18.1 

Table  2.  Comparison  of  our  OCBA  approach  and  traditional 
two-stage  procedures.  P*=  90%. 
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Total  #  of 

M  V'lClI    IT   KJ  X 

Replications 

Sneedun 
factor  using 
OCBA 

OCBA 

735.4 

98.1% 

Dudewicz's  Proc. 

11023.2 

99.4% 

14.6 

Rinott's  Proc. 

12464.1 

99.5% 

16.5 

Table  3.  Comparison  of  our  OCBA  approach  and  traditional 
two-stage  procedures.  P*=  95%. 

6.  Concluding  Remarks 


In  this  paper  we  present  an  optimal  computing  budget 
allocation  technique  that  can  intelligently  determine  the  best 
simulation  replication  numbers  for  all  designs.  We  also 
compare  our  approach  with  traditional  two-stage  procedures  by 
conducting  a  numerical  experiment.  Numerical  testing  shows 
that  our  approach  is  more  than  ten  times  faster  than  two-stage 
procedures.  For  more  details  regarding  this  techniques  readers 
are  recommended  to  see  Chen  etal.  (1997). 
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Figure  1.  Computing  budget  allocation  determined  by  OCBA.  P*=  90%  and  n{)  =  10.  (design  1  is  the  best  design). 
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Abstract 

In  this  paper,  properties  of  ordinal  comparison  for  discrete- 
event  dynamic  systems  are  investigated  by  employing  the 
large  deviation  principle  which  allows  one  to  have  an  ex- 
pression for  the  rate  of  convergence  of  ordinal  comparison. 
With  this  expression,  conditions  are  obtained  under  which 
the  rate  of  convergence  of  ordinal  comparison  is  exponen- 
tial. Such  expression  also  enables  one  to  obtain  bounds 
on  the  rate  of  convergence  and  design  sample  path  genera- 
tion schemes  that  maximizes  the  convergence  rate  of  ordi- 
nal comparison. 

1  Introduction 

Many  design  problems  in  Discrete- Event  Dynamic  Systems 
(DEDS)  require  choosing  one  design  from  a  large,  discrete 
design  space,  such  as  the  problem  of  resource  allocation 
[5].  Such  problems  are  of  combinatorial  nature  and  known 
to  be  NP  hard  (difficult  to  solve  [17]).  The  design  of  DEDS 
is  further  complicated  by  the  fact  that  not  only  the  design 
spaces  are  large  and  discrete  but  also  the  performance  mea- 
sures on  which  designing  is  based  do  not  have  closed-form 
solutions.  Except  for  special  cases,  discrete-event  simu- 
lation or  sample  path  observation  is  usually  used  in  per- 
formance estimation  and  evaluation  of  designs  in  DEDS. 
Such  performance  estimation  is  time-consuming  and  costly. 
Typically,  a  performance  estimator  converges  slowly  at 
rate  of  0(1 /y/i)  in  simulation  or  observation  time  t  [9]. 

On  the  other  hand,  in  reality,  we  often  face  a  decision 
making  problem  of  choosing  one  relatively  better  design 
from  all  possible  alternatives.  As  long  as  we  can  single  out 
the  "good"  and  "satisficing"  designs,  the  exact  values  of 
their  performance  measures  are  of  secondary  importance. 
Such  goal  relaxation  can  bring  great  saving  of  effort.  In  the 
context  of  recently  proposed  ordinal  optimization  methods, 
ordinal  comparison  of  different  designs  by  their  relative 
ranks  is  very  efficient,  as  observed  in  many  experiments. 
Ordinal  comparison  is  able  to  discern  quickly  the  good 
designs  using  noisy  information  [1,  6,  7,  10,  13,  14,  18]. 
By  ordinal  comparison,  we  mean  comparing  the  relative 

"This  work  was  supported  in  part  by  the  National  Science  Foun- 
dation under  grant  No.  ECS-9624279  and  by  the  University  of  Penn- 
sylvania Research  Foundation. 


Chun-Hung  Chen 
Dept.  of  Systems  Engineering 
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goodness  (rank)  of  different  designs  without  knowing  the 
exact  values  of  corresponding  performance  measures.  Re- 
cent research  has  revealed  a  more  interesting  phenomenon: 
Ordinal  comparison  is  beneficial  when  used  in  conjunction 
with  conventional  methods  such  as  stochastic  approxima- 
tion and  simulated  annealing  as  reported  in  [3,  11,  22]. 

Consider  the  problem  of  finding  the  best  or  a  good  de- 
sign among  N  possibilities.  Let  0  =  {#1,02,  •■•>  #jv}  denote 
the  set  of  all  designs  and  J(9)  €  R  denote  the  performance 
measure  of  a  particular  design  9  €  O.  In  general  DEDS, 
the  exact  form  of  J(9)  is  not  available.  We  can  only  use 
noisy  estimate  of  J (9)  in  our  decision  making.  Let  us  con- 
sider the  dynamics  of  the  following  experiment.  We  si- 
multaneously simulate  N  parallel  DEDS  with  designs  0j, 

1  —  1,2,  ...,N.  This  can  be  done,  for  example,  using  the 
standard  clock  [21]  or  the  augmented  system  analysis  [4]. 
As  the  simulation  continuous,  we  collect  data  and  output 
an  estimate  of  J(9),  denoted  by  L(9,t),  for  every  9  6  0. 
Convergence  of  L(9,t)  to  J (9)  is  generally  slow  at  the  rate 
of  at  most  0(1  /y/t)  according  to  the  law  of  large  num- 
bers (assuming  the  validity  of  convergence).  However,  it 
has  been  observed  that  the  observed  order  of  performance 
measures  L{9i,t),  i  —  1,2,..., TV,  can  quickly  converge  to 
an  order  very  close  to  the  true  order  of  performance  mea- 
sures J(9i),  i  =  1, 2, TV,  despite  possible  presence  of 
large  noises  [1,  6,  7,  10,  13,  14,  18]. 

This  paper  is  concerned  with  several  fundamental  prop- 
erties of  ordinal  comparison  in  DEDS.  By  applying  the 
large  deviation  principle  one  is  able  to  obtain  an  expression 
for  the  rate  of  convergence  of  ordinal  comparison.  With 
this  expression,  conditions  are  obtained  under  which  the 
rate  of  convergence  of  ordinal  comparison  is  exponentially 
fast.  Such  expression  also  enables  one  to  obtain  bounds  on 
the  rate  of  convergence  and  design  sample  path  generation 
schemes  that  maximizes  the  convergence  rate  of  ordinal 
comparison. 

2  Problem  Statement 

The  DEDS  under  investigation  is  described  by  a  right- 
continuous  stochastic  process  {X(9,£,t)  £  X,t  >  0}  pa- 
rameterized by  9  €  0  and  defined  on  a  common  probability 
space.  Here,  X  C  R  is  the  state  space,  9  is  used  simply 
to  indicate  a  design,  and  £  represents  all  the  randomness 
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involved.  For  each  design  9  G  0,  let  L(9,t)  G  R  be  an 
estimate  of  a  performance  measure  J '(d)  G  R  based  on  a 
particular  sample  trajectory  of  {X(9,£,t)}  over  [0,t].  In 
DEDS,  J (9)  is  often  a  steady  state  performance  and  L(9,  t) 
is  a  sample  performance  over  [0,  t].  Note  that  L(9,t)  con- 
verges slowly  with  rate  at  most  0(l/\/t).  Long  and  time- 
consuming  simulation  has  to  be  performed  in  order  to  have 
a  good  estimate  of  the  (steady  state)  performance  measure 
J  (9). 

Now  consider  the  problem  of  comparing  performance 
measures  corresponding  to  a  set  0  of  iV,  1  <  N  <  oo, 
designs.  Design  9{  is  said  to  be  better  than  design  9j 
if  J(9i)  >  J(9j).  Without  loss  of  generality,  we  as- 
sume that  the  N  designs  are  indexed  in  such  a  way  that 
oo  >  J(6i)  >  J{92)  >  J(03)  >  ...  >  J(9N)  >  -oo.  Partic- 
ularly, we  are  interested  in  finding  a  design  that  is  one 
of  the  M,  1  <  M  <  N,  true  best  designs  in  0.  For 
convenience,  let  us  define  0g  =  {9i,i  =  1,2,  ...,M}  and 
06  =  {9i}i  =  M  +  1, M  +  2,  ...,N}  the  sets  of  "good"  and 
"bad"  designs,  respectively.  For  the  problem  of  finding  one 
of  the  M  best  designs  based  on  the  simulation  over  [0,i], 
we  choose  the  design  with  the  largest  sample  performance, 


i.e., 


9t  —  axs  max.  L(6,t). 
1  eee 


(2.1) 


Experimental  results  have  shown  that  (2.1)  can  qi  :~kly 
find  the  true  desired  design  [1,  6,  13,  18].  Intuitively,  . 
implies  that  relative  order  of  performance  measures  con- 
verges very  fast. 

In  order  to  characterize  the  convergence  of  ordinal  com- 
parison, we  define  the  following  indicator  process 


I(t)  =  i  1    if  max^eegL(^,t)>maxg€@bL(9,t) 
'  0  otherwise. 


(2.2) 


Then  I(t)  is  equal  to  1  if  the  observed  best  design  is 
among  the  true  good  designs  and  equal  to  0  otherwise. 
Since  maxaeeg  L(a,  t)  is  the  maximum  of  the  observed  true 
"good"  designs  and  maxgeet  L(0,  t)  is  the  maximum  of  the 
observed  true  "bad"  designs,  I(t)  is  a  function  of  t  indi- 
cating when  the  observed  best  design  determined  by  (2.1) 
is  one  of  the  M  true  best  designs. 

Our  main  goal  is  to  examine  the  behavior  of 


3    The  Large  Deviation  Principle 

The  concept  of  large  deviation  has  been  proven  useful  in 
the  study  of  the  convergence  of  ordinal  comparison.  The 
following  definition  is  adopted  from  [8]. 

Definition  3.1  Let  {Xt}  be  a  family  of  random  vari- 
ables defined  on  a  state  space  X.  We  say  that  {Xt}  satis- 
fies the  large  deviation  principle  with  a  rate  function  0  if, 
for  every  closed  set  A  C  X,  limsup£_>0  elogProb[X£  C 
A]  <  —  infA6j4  0(A),  and  for  every  open  set  B  C  X, 
liminfe_>.oelogProb[Xe  CB]>  -infA6B0(A),  where  the 
mapping  0  :  X  ->  [0,  oo]  is  lower  semi  continuous,  i.e., 
liminf^^A  <j)(\n)  >  0(A)  for  all  A  G  X.  □ 

Now  define  the  logarithmic  moment  generating  func- 
tion Ae(s)  =  log£[esA_e]  and  suppose  that  the  limit  A(s)  = 
lim£_>0  eAe(s/e)  exists  pointwise.  The  following  theorem 
from  [8]  gives  an  expression  for  the  rate  function  <fi. 

Theorem  3.2  Assume  that  {Xe}  satisfies  the  large 
deviation  principle  with  a  rate  function  <f).  If  (j)  is  convex 
and,  for  all  c  G  R+,  the  set  {A  :  (j>(\)  <  c}  is  a  compact 
subset  of  X,  then  <j>  is  the  Fenchel-Legendre  transform  of 
A(s),  namely, 


(p(X)  =  sup{As  -  A(s)}. 

s 

If,  in  addition, 

inf  0(A)  =  inf  0(A) 
X€A 


□ 


(3.3) 


for  A  C  X  where  A0  and  A  are  the  interior  and  closure 
of  A,  respectively,  then  we  know  from  Definition  3.1  and 
Theorem  3.2  that 

lim  elogProb[Xe  C  A]  =  -  inf_0(A).  (3.4) 


£->0 


xeA 


Choose  A  -  (0,  oo).  Then, 


inf   0(A)  =    inf   sup{As- A(s)}=  -  inf  A(s). 

A€[0,oo)  A<E[0,oo)s>0  s€[0,oo) 

Therefore, 

lime  log  Prob[X£  C  (0,oo)]=  inf   A(s).  (3.5) 

t— >0  s€[0,oo) 

It  gives  an  expression  for  the  convergence  rate  of 
Prob[X£  C  (0,  oo)]  as  e  goes  to  zero. 


Prob[7(i)  =  1]  and  Prob[7(t)  =  0] 

as  a  function  of  time  t.  If  limt_^oo  L(9,  t)  =  J(9),a.s.,  then 
([6]) 

lim  Prob[/(t)  =  1]  =  1,    lim  Prob[/(t)  =  0]  =  0. 

t— too  '  t—too 

The  convergence  rate  of  I(t)  will  be  taken  as  the  rate  at 
which  Prob[/(£)  =  0]  goes  to  zero  (or  the  rate  at  which 
Prob[7(i)  =  1]  goes  to  1). 


4    Rate  of  Convergence 

The  large  deviation  principle  can  be  used  to  investigate 
the  convergence  rate  of  ordinal  comparison. 

Lemma  4.1  ([16,  19])  Let  ft(a)  be  a  sequence  of  func- 
tions on  A.  Suppose  lim€_>.o  fe{a>)  =  f(a)  for  every  a  £  A. 
If  {/£(a)}  are  convex,  differentiate  at  ao  G  A,  and  f(a)  is 
differentiate  at  ao,  then 

lim  ^(Qo)  =  d/(Qo) 
e->o    da  da 
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let 
and 


Theorem  4.2  For  any  pair  of  designs  a  G  Qg,  9  G  ©(,, 
Xt{(T,6)  =  L{6,t)  -  L(a,t) 


A(s,a,e)  =  lim  ilogE[ealX'(<7-fl)]. 


Assume  that 

(i)  Xt{a,  9)  satisfies  the  large  deviation  principle  and  the 
rate  expression  (3.5)  holds, 

(ii)  \ogE[estX,{a'e)]  and  A(s,a,6)  are  differentiate  at 
s  =  0, 

(iii)  and  for  every  design  9  G  0, 

lim  E[L($,t)]  =  J{0). 

t-»oo 

Then  there  exist  positive  constants  a  >  0,  (5  >  0  such  that 

Prob[/(*)  =  0]  <  (3e~a\ 

in  other  words,  the  convergence  of  ordinal  comparison  is 
exponentially  fast.  Q 
Proof.    According  to  the  definition  of  the  indicator 
process, 

Prob[I(t)  =  0]  =  Prob[max  L(a,t)  <  max L(9,t)} 

<r€©9  9e®b 

<  min  Prob[L(a,i)  <  max  L{6,t)} 


9eQb 


min  Prob[  (J  {L(<M)  <  L(0,«)}] 


o-ee 


<  mm        Prob[L(a,  t)  <  L(0,  t)] .  (4.6) 


eeeb 


Since  Xt(<T,  ^)  satisfies  the  large  deviation  principle,  we 
know  from  (3.5)  that 

lim  -  log  Prob[L((7,  *)<£(#,*)] 

t— >oo  t 

1 


=  lim  -logProb[Xt(a,0)  C  (0,  oo)] 

t— >oo  f 

=     inf    A(s,cr,  ^). 

se[o,oo) 


(4.7) 


Since  log  E[estXt^a^}  is  convex  in  a  neighborhood  of  s  =  0 
and  is  differentiate  at  s  =  0,  we  know  from  Lemma  4.1 
that 


as  t-^oo  r  at/ 

=  lim  E[Xt((7,0)]. 

t— >-CX3 

The  assumption  (iii)  implies  that 

lim  E[Xt(a,6)}  =  J  {9)  -  J  (a)  <  0. 

t— >oo 


s=0 


Therefore,  A(s,a,9)  <  0  for  small  s,  which  implies  that 
there  exists  an  a  >  0  such  that 

inf   A(s,er,  9)  <  -a. 
se[o,oo) 

Combining  the  previous  inequality  with  (4.7)  yields, 


lim  -logProb[L(<r,i)<L(0,i)l  <  -a. 

The  proof  of  the  theorem  follows  from  this  inequality  and 
(4.6).  □ 
Theorem  4.2  assures  the  exponential  convergence  of  or- 
dinal comparison  under  some  conditions  in  terms  of  the 
large  deviation  principle.  Other  conditions  that  also  guar- 
antee such  exponential  convergence  are  given  in  [6,  10].  In 
particular  cases,  the  requirement  for  the  rate  expression 
(3.5)  can  be  relaxed.  Some  important  cases  are  discussed 
in  detail  in  [6]. 

5    Bounds  on  Convergence  Rate 

In  this  section,  we  find  bounds  for  the  rate  of  convergence 
of  ordinal  comparison.  Based  on  these  bounds,  we  propose 
simulation  schemes  that  maximize  the  rate  of  convergence 
of  the  indicator  process.  The  results  also  shed  useful  in- 
sight for  sample  path  reconstruction  in  performance  anal- 
ysis. 
Let 

Yt  =  maxL(9,t)  —  max  L(a,  t). 
eeeb  v     '    «rGes   v  '  1 

If  Yt  satisfies  the  large  deviation  principle  and  if  the  rate 
expression  (3.5)  holds,  then  the  convergence  rate  r  of 
Prob[/(*)  =  0]  is 

r  —  -  inf    lim  iw  E[est(maX96ebL(6l't)_max'Te09  L(<T't"l 

s€[0,oo)*-+oo  t 

(5.8) 

Therefore,  maximizing  the  rate  is  equivalent  to  maximizing 
(5.8).  It  is  sufficient  to  minimize 

At(s)  d=  £^es*(maxe€ei,£'(0,<)-max,r€e9  L(<r,t))i 

for  all  s  >  0,  t  >  0,  and  consequently 


r  =  —  inf    lim  -logAf(s). 

sG[0,oo)t->-oo  t 


(5.9) 


A  function  g(X\,X2)  :  R2  — >  R  is  said  to  be  superad- 
ditive  if,  for  any  Xx  <  X[,  X2  <  X'2, 

g(X'1,X2)+g(X1,X2)<g(X1,X2)  +  g(X'1,X£.  (5.10) 

If  g(Xi,X2)  is  twice  differentiable,  it  is  superadditive  if 
and  only  if  d2g(Xu  X2)/dX1dX2>0.  The  following  result 
is  due  to  [2,  20]. 
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Lemma  5.1  Let  Xi,X2  be  random  variables  with 
marginal  distributions  Fi(xi),  i  =  1,2,  and  joint  dis- 
tribution H(xi,x2).  Assume  that  g{X\,X2)  is  right- 
continuous  and  superadditive  and  E[g(Xi,X2)]  is  finite 
for  all  H(xi,x2).  Then 

snpE[g(X1,X2)}=E[g(F1-1(u),F2-1(u))]  (5.11) 
{"} 

inf  E[g(X1,X2)]  =  E[g{F^ \u)tF^{\  -  u))]  (5.12) 

where  the  random  variable  u  is  uniform  over  [0, 1]  and  the 
inverse  F-1(u)  is  defined  as 

F_1(u)  =  inf  {a;  e  R  |  F(x)  >  it}.  □ 

Lemma  5.1  says  that  l?[<7(Xi, X2)]  is  maximized  by 
sampling  Xi  and  X2  using  the  same  random  number  u 
according  to 

Xi  =  Fr1{u),    i  =  1,2. 

This  scheme  of  generating  samples  of  random  variables  is 
known  as  the  scheme  of  common  random  numbers  (CRN). 
It  is  perhaps  the  most  popular  scheme  for  variance  reduc- 
tion in  simulation. 

Let  fix  t  and  denote  by  Ge(w)  =  Prob[L(9,  t)  <  w]  the 
c.d.f  of  L(9,  t)  for  each  0  e  0.  Define 

Gg(w)^mm{Ge(w)},Gg(w)^max{0^-J2[^e(w)}} 

G6H=ym{G,(^)},G^H=fmax{0,l-^[l-G9(u;)]} 

Then  the  following  theorem  gives  lower  and  upper  bounds 
for  the  convergence  rate  r  of  ordinal  comparison. 

Theorem  5.2  Suppose  that  Yt  satisfies  the  large  de- 
viation principle  and  that  the  rate  expression  (3.5)  holds. 
Then 

—  inf    lim  -logAt(s)  <  r  <  -  inf    lim  -logAt(s) 

se[0,oo)<->oo  t  se[0,oo)*->oo  t 

(5.13) 

where 

At(s)=E[es'((5"(u)-^1(u))], 

At(s)  =  E[est^1(u)-e9"1(1-u))].  □ 
Proof.  Let 

W\  =  max  L(9,t),  W2  =  max  L(cr,  t). 
ee&b  o-e©9 

For  any  t  >  0,  s  >  0,  consider  the  function 

_est(W,-W2)^  (514) 

It  is  continuous  and  superadditive  in  (Wi,  W2).  Therefore, 
Lemma  5.1  applies  and,  for  all  i  >  0,  all  s  >  0,  and  all 

9  e  e6, 

sup  E[-est^-w^}  =  Ei-e3^^-^^},  (5.15) 


inf  E[-estW-w^]  =  E[-est(F">)-F™l{1-u)))  (5.16) 

{/fu.} 

where  {Hw}  is  the  set  of  all  correlations  among  {Wi,W2) 
and  Fw!  and  Fw2  are  the  distribution  functions  of  Wi  and 
W2,  respectively. 

However,  from  the  definitions  of  W\  and  W2  and  ac- 
cording to  basic  probability  properties,  we  know  that 

Prob[Wi  <w]  =  Prob[maxL(0,i)  <  w] 

<  min  Prob[L(9,t)  <  w]  =  min  Gb(w)  =  Gb(w). 
On  the  other  hand, 

Prob^  >w]  =  Prob[max  L(9,t)  >  w] 

<        P™b[L{6,t)  >w}=  ]T  [1  -  Gg(w)} 
oe@b  eeeb 

which  implies 

Prob[Wi  <w]  >max{0,l-^][l-GeH]}  =  GjH. 

eeeb 

Therefore, 

Ob(w)<FWl(w)<Gb(w).  (5.17) 

Similarly, 

Gg(w)  <  FW2(w)  <  Gg(w).  (5.18) 
With  those  bounds,  it  is  easy  to  verify  that 

_est{F-\(u)-F-\{u))  <  _est(G;l(u)-G-gx(u))^ 
_est(F-\{u)-F-\(l-u))  >  _est(G-1(U)-GJ1(l-U)) 

for  any  st  >  0.  Combining  this  with  (5.15)  and  (5.16),  we 
know  that 

E^est(G-\u)-G-\u))^  <  E[estYt]  <  E{est{G-\u)-G-\l-u))^ 

(5.19) 

or  equivalently, 

At(s)  >  At(s)  >  At(s). 

The  combination  of  the  preceding  inequality  with  (5.9) 
yields  (5.13).  The  proof  is  thus  complete.  □ 

Theorem  5.2  makes  it  possible  to  design  simulation  ex- 
periments so  that  the  rate  of  convergence  of  ordinal  com- 
parison is  maximized. 

Corollary  5.3  Assume  M  —  1.  Suppose  that  Yt  sat- 
isfies the  large  deviation  principle  and  that  the  rate  ex- 
pression (3.5)  holds.  Then  the  rate  of  convergence  of  the 
indicator  process  is  maximized  by  sampling  L(8,t)  using 
the  scheme  of  CRN. 

Proof.  For  M  =  1,  Qg  —  {9\}  contains  only  one  de- 
sign 9\.  In  this  case,  Gg(w)  —  Gg(w)  =  Gq1(w).  Then 
Theorem  5.2  shows  that  the  rate  of  convergence  of  ordinal 
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comparison  is  given  by  (5.9)  in  which  At(s)  is  bound  from 
below  by 

At(5)  >  Kt(s)  =  E[eat^u)-G>>\ 

that  is 

mf  At(s)  =  E[e      b  "i  J 

where  {Hl}  is  the  set  of  all  correlations  among  {L(6,t)}. 
Since  the  lower  bound  can  be  achieved  by  sampling  all 
L(9,t)  using  the  scheme  of  CRN.  Therefore,  (5.8),  or  the 
convergence  rate  of  ordinal  comparison,  is  maximized  by 
the  scheme  of  CRN.  □ 
As  far  as  ordinal  comparison  is  concerned,  it  is  impor- 
tant to  be  able  to  design  simulation  experiments  so  the 
upper  bound  in  (5.13)  is  achieved.  Let  us  consider  this 
problem  for  different  values  of  M. 

(i)  M  =  1.  In  this  case,  Corollary  5.3  shows  that  the 
upper  bound  in  (5.13)  can  be  achieved  by  the  scheme 
of  CRN. 

(ii)  M  =  2.  We  know  from  the  proof  of  Theorem  5.2 
that  the  upper  bound  is  achieved  when  W\  and  W2 
are  sampled  using  the  scheme  of  CRN  when  the  dis- 
tribution of  W\  =  maxee0i,  L(9,  t)  is 

FWl{w)=Gb(w),  (5.20) 

and  the  distribution  of  W2  =  maxffees  L(a,t)  is 

FW2(w)=Gg(w).  (5.21) 

The  distribution  (5.20)  is  achieved  by  setting 

L(6,t)  =  GJ1^),  9  e  06. 

The  distribution  (5.21)  is  achieved  by  setting 

L{6ut)  =  Gj*{v),  L(62,t)  =  G^{\  -  u). 

It  is  easy  to  verify  that  such  sample  scheme  yields  the 
upper  bound  in  (5.13).  However,  it  should  be  pointed 
out  that  such  scheme  is  not  practical  because  the  set 
Qg  and  ©6  are  unknown  a  priori.  Nevertheless,  this 
fact  shows  that  the  scheme  of  CRN  does  not  give  the 
maximum  rate  for  ordinal  comparison. 

(iii)  M  >  3.  In  this  case,  it  can  be  shown  using  counter- 
examples that  Gg(w)  is  not  a  distribution  but  can 
be  asymptotically  achieved  by  a  sequence  of  random 
variables  [20].  As  a  result,  it  is  impossible  to  find 
sampling  schemes  that  achieve  the  bounds  in  (5.13) 
for  M  >  3.  This  means  that  the  scheme  of  CRN  does 
not  give  the  maximum  rate  for  ordinal  comparison 
for  M  >  3. 


To  sum  up,  it  is  difficult,  if  not  impossible,  in  practical 
simulation  to  achieve  the  maximum  rate  for  M  >  2.  For 
M  =  1,  or  for  the  problem  of  selecting  the  best  from  pos- 
sible N  designs,  Corollary  5.3  shows  that  the  maximum 
rate  of  convergence  in  (5.13)  can  be  achieved  by  sampling 
using  the  scheme  of  CRN.  This  assures  the  positive  role 
of  CRN  when  L(9,t)  can  be  sampled  from  a  single  ran- 
dom number  u.  Of  course,  for  practical  problems  L(0,t) 
has  more  complicated  form.  In  simulations  or  sample  path 
constructions  of  DEDS,  L(9,t)  is  usually  a  function  a  set 
of  random  numbers,  i.e., 

L(9,t)  =  L(u(l,0),u(2,d),...,u(n,0)) 

where  n  is  an  integer  determined  by  the  system  structure 
and  t.  It  is  possible  to  write  L(9,t)  as  a  non-decreasing 
function  of  u(i,6)  for  a  large  class  of  DEDS  (we  shall  not 
describe  the  exact  class  of  systems  because  of  the  limited 
space.  See  [7]  for  further  details).  The  following  discus- 
sion is  restricted  to  a  case  that  is  easily  realizable  in  sim- 
ulation and  that  has  explicit  solution.  We  assume  that 
{(u(n,#i),u(n,#2)>  u(n,$N))}  is  a  sequence  of  indepen- 
dent random  vectors.  We  would  like  to  find  the  correlation 
among  u(n,  u(n,  62),  ...,u(u,9n),  for  every  fixed  n,  that 
maximizes  the  rate  of  ordinal  comparison.  This  scheme  can 
be  easily  realized  and  most  useful. 

Assume  again  M  =  1.  For  any  6  €  ©6,  let 

xt{o)  =  L(d,t)-L{eut). 

Then  it  has  been  proved  that  the  scheme  of  CRN  max- 
imizes the  rate  of  convergence  of  ordinal  comparison  as 
summarized  by  the  following  theorem.  The  details  can  be 
found  in  [7]. 

Theorem  5.4  Assume  that  L(9,t)  is  right-continuous, 
non-decreasing  in  u(n,9)  for  every  t  and  every  n  >  1.  If, 
for  every  9  G  ©t,  Xt(9)  satisfies  the  large  deviation  prin- 
ciple and  if  the  rate  expression  (3.5)  holds,  then  the  rate 
of  convergence  for  the  indicator  process  is  maximized  by 
choosing 

u(n,e)  =  v(n),  (5.22) 

where  {v (n)}  is  a  sequence  of  i.i.d.  random  numbers  uni- 
form on  [0,1],  in  other  words,  the  rate  is  maximized  by 
using  the  scheme  of  CRN.  □ 
Additional  results  regarding  the  effect  of  correlation  on 
the  convergence  of  ordinal  comparison  can  be  found  in  [7, 
10,  15]. 

The  following  example,  modified  from  an  example  of 
[12],  illustrates  the  difference  in  the  rates  of  convergence 
for  independent  simulations  and  simulations  using  CRN. 

Example  5.4  Consider  an  M/G/l  FCFS  queue  with 
arrival  rate  A.  The  service  times  have  the  Weibull  distri- 
bution G(x)  =  1  —  e~(7I)a,x  >  0  for  constants  j,a  >  0. 
Let  the  design  parameter  be  9  =  (A, 7, a).  The  perfor- 
mance measure  is  the  mean  queue  length  in  steady  state. 
We  consider  5  designs  with  the  following  parameters: 
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The  performance  is  taken  as  the  negative  of  the  mean 
queue  length.  The  true  best  design  is  6\.  The  simulation 
performances  for  independent  simulations  and  for  simula- 
tion using  CRN  are  shown  respectively  in  Figures  1  and  2. 
Note  that  the  displayed  time  is  the  real  queue  time,  not 
the  simulation  time.  The  simulation  was  completed  within 
seconds. 


0  500  1000  1500  2000  0  500  1000  1500  2000 


(a)  Sample  performances    (b)  Indicator  process 
Figure  1.  Independent  simulations 


0  500  1000  1500  2000  0 


0  500  10O0  ISOO  2000 


(a)  Sample  performances    (b)  Indicator  process 
Figure  2.  Simulations  using  CRN 

Figures  1  and  2  show  that  correlation  could  signifi- 
cantly speed  up  the  convergence  of  ordinal  comparison. 
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ABSTRACT 

The  paper  presents  an  approach  to  building  an  Integrated 
Learning  Architecture  in  which  Case-Based  Reasoning  is  used 
both  as  a  corrective  of  solutions  inferred  by  other  problem- 
solving  paradigms  and  as  a  method  for  accumulating  and  re- 
fining knowledge  of  a  knowledge  based  system.  The  proposed 
architecture  is  intended  to  solve  the  classification  task.  The  ap- 
plications of  the  approach  for  case-based  maintenance  of  rule- 
based  systems  and  for  case-based  refinement  of  neural  networks 
are  briefly  described. 

KEYWORDS:  Machine  Learning,  Integrated  Learn- 
ing Architecture,  Case-Based  Reasoning,  Knowledge  Re- 
finement, Neural  Networks. 


solving  paradigms  and  as  a  method  for  accumulating  and 
refining  system  knowledge.  The  proposed  ILA  is  intended 
to  solve  the  classification  task  which  is  assumed  to  be  de- 
scribed by  one  of  its  existing  generic  model  (e.g.  [5,  14]). 

2    INTEGRATED  PROBLEM- 
SOLVING  AND  LEARNING 

2.1.  The  System  Architecture 

The  case-based  ILA  structure  (see  Fig.  1).  consists  of 
three  main  parts:  an  original  KBS,  a  Case  Base  Formation 
Module  (CBF)  and  a  case-based  reasoning  module  (CB- 
Reasoner). 


1  INTRODUCTION 

An  important  requirement  to  the  modern  knowledge  based 
systems  (KBS)  is  the  ability  to  operate  in  the  real  (open) 
environment  [1].  In  order  to  do  this  they  should  be  pro- 
vided by  the  mechanisms  allowing  them  to  correct  auto- 
matically partially  incomplete  and  incorrect  expert  knowl- 
edge used.  This  means  that  the  modern  KBS  should  be 
able  to  learn. 

From  the  other  side  the  modern  technology  of  KBS  de- 
velopment is  radically  changed.  The  emphasis  now  is  on 
analyzing,  identifying  and  adapting  the  already  developed 
KBS  components  (databases,  acquired  domain  knowledge, 
KBS  shell  and  models)  rather  then  on  the  development  of 
KBS  "on  scratch" . 

A  promising  approach  to  solve  both  problems  is  the 
development  of  Integrated  Learning  Architectures  (ILA) 
which  are  able  to  use  several  different  problem-solving 
paradigms  or/and  machine  learning  methods  [15,  1,  2]. 

The  present  paper  describes  an  approach  to  building 
such  ILA  in  which  case-based  reasoning  (CBR)  [10]  is  used 
both  as  a  corrective  of  solutions  inferred  by  other  problem- 

*This  research  was  partially  supported  by  National  Science  Foun- 
dation, Grant  1-606/96 


2.1.1.  The  Case  Base  Formation  Module 

The  CBF  module  uses  domain  knowledge,  training  exam- 
ples and  the  KBS  problem-solving  model  for  creating  a 
memory  of  cases.  A  case  is  represented  as  a  single  infor- 
mation structure  containing  the  case  name,  a  list  of  case 
features  (attribute  -  value  pairs)  and  the  case  solution. 
The  module  should  decide  i)  what  training  examples  are 
worth  to  be  transformed  and  stored  into  the  memory  as 
useful  past  cases,  ii)  how  the  memory  should  be  indexed 
to  retrieved  only  relevant  to  the  current  situation  cases 
and  iii)  how  to  construct  failure  predictors  for  recognizing 
potentially  "dangerous"  situations  in  which  the  KBS  solu- 
tions should  be  corrected.  The  module  should  also  be  able 
choose  an  appropriate  similarity  metrics  which  measures 
the  degree  of  likeness  between  the  problem  at  hand  and 
the  retrieved  past  cases.  The  CBF  module  runs  off-line 
and  initializes  the  CB-Reasoner. 

2.1.2.  The  Case-Based  Reasoner 

As  an  input  to  the  reasoner  the  output  case  and  its  solu- 
tion formed  after  solving  the  problem  by  KBS  are  used. 
The  CB-reasoner  architecture  is  an  instance  of  the  general 
architecture  of  a  case-based  planer  adapted  for  solving  the 
classification  task  [9,  2]. 


199 


 (  Domain  ^  ( 

~~\  Knowledge/  " 


Input  Case 
1  ~ 


Trai 
Exampl 


nplesy 


KBS  ^  

Model  J  " 


KBS 


CBF 
Module 


r 


Output 
Case 


KBS 
Solutions 


Modifier 


Best 
Case 

~T7 


Indexes 


Failure 
Explanation 


.(  Case  \ 
I  Memory  / 


L 


Case-Based  Reasoner 


(g|g> — CSB) 


Repaired 
Case 

Rep 

airer 

End  User 


Figure  1.  The  Case-Based  Integrated  Learning  Architecture 


The  KBS  solutions  are  initially  processed  by  the  Ana- 
lyzer which  should  decide  whether  the  solutions  must  be 
corrected  and  if  so  which  set  of  indexes  should  be  used  to 
determine  relevant  past  cases.  The  indexes  and  the  current 
problem  description  along  with  the  predefined  similarity 
metrics  are  used  by  the  Retriever  to  find  the  most  similar 
("best")  case  which  solution  could  be  applied  to  the  cur- 
rent problem.  The  decision  about  what  solution  should  be 
preferred  -  provided  by  KBS  or  by  the  CB-reasoner  -  is  the 
responsibility  of  the  Modifier.  The  final  solution  is  given 
to  the  end  user  and  the  Storer  forms  and  "remembers" 
the  current  case  into  the  memory. 

After  receiving  a  "feedback"  from  the  user  who  approves 
or  rejects  the  final  solution,  the  Repairer  starts  its  oper- 
ation. In  case  of  approval,  the  Repairer  sends  the  con- 
firmation to  the  Storer  which  then  decides  whether  it  is 
worthwhile  to  continue  remembering  this  correctly  solved 
case  or  it  may  be  forgotten.  Otherwise,  the  Repairer  tries 
to  "explain"  the  failed  case.  In  such  situations  the  Re- 
pairer creates  failure  predictors  which  then  used  by  the 
Analyzer  to  avoid  repetition  of  similar  failures  in  the  fu- 
ture. 

2.2.  Failure  Prediction 

The  process  of  searching  a  case-based  solution  starts  only 
when  the  current  situation  is  recognized  as  "dangerous", 
i.e.  if  there  is  a  sufficiently  convincing  evidence  that  the 
KBS  solution  is  wrong.  This  prediction  process  is  based  on 
comparing  the  KBS  solution  to  the  set  of  failure  predictors 
formed  by  the  CBF  module  and  extended  by  the  Repairer. 
The  roles  of  such  predictors  are  played  by  the  KBS  solu- 
tions of  the  training  examples  included  in  the  case  base 


(if  such  examples  are  available)  and/or  by  the  system  so- 
lutions recognized  as  wrong  by  the  user.  In  other  words, 
the  fact  that  KBS  has  inferred  once  an  erroneous  solution 
is  considered  as  a  sufficiently  convincing  evidence  for  sus- 
pecting the  system  every  time  when  it  infers  the  same  so- 
lution of  another  problem.  Conversely,  if  a  KBS  solution 
of  a  new  problem  is  different  from  the  known  failure  pre- 
dictors, there  are  no  reasons  to  "suspect"  the  system  and 
its  solution  is  accepted  without  any  doubt. 

2.3.  Case  Retrieval  and  Conflict  Reconciling 

The  success  of  integrated  approaches  combining  sev- 
eral problem-solving  paradigms  crucially  depends  on  the 
scheme  for  reconciling  the  conflicts  between  them.  Most  of 
the  proposed  methods  (see  e.g.  [8]  for  integration  of  CBR 
with  rule-based  reasoning)  use  a  threshold  scheme  in  which 
the  threshold  values  are  selected  ad  hoc.  We  avoid  this  de- 
ficiency by  forming  a  proper  set  of  cases  to  be  retrieved.  In 
this  set  we  include  not  only  the  cases  denying  the  proposed 
KBS  solution  but  also  the  cases  confirming  it.  In  such  a 
way  the  retrieval  set  reflects  the  ILA  problem  solving  ex- 
perience and  restricts  the  search  space  only  to  the  relevant 
(from  the  ILA  point  of  view)  cases. 

3    CASE-BASED  MAINTENAN- 
CE OF  RULE-BASED  SYSTEM 

The  described  above  architecture  was  applied  to  the  task 
of  improving  a  rule-base  system  (RBS)  problem-solving  be- 
havior in  the  course  of  its  operation  [3]  -  an  important  part 
of  the  adaptive  maintenance  task  [6].  In  this  application 
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the  KBS  training  examples  are  assumed  to  be  unknown1.       3.3.  Indexing  Scheme 


3.1.  Knowledge  Components 

Domain  Knowledge  is  non-probabilistic  rules  represented 
as  flat  structures  associating  conjunctions  of  the  problem 
features  with  problem  solutions.  A  feature  is  a  nomi- 
nal attribute-value  pair.  KBS  Model  treats  the  system 
problem-solving  behavior  as  a  hypothesis  driven  process. 

It  is  assumed  that  KBS  is  able  to  perform  both  back- 
ward and  forward  chaining  since  forward  chaining  is  nor- 
mally used  for  generation  of  a  list  of  differential  hypothe- 
ses (possible  solutions)  and  backward  chaining  -  for  testing 
them.  A  hypothesis  is  considered  to  be  confirmed  if  there 
is  a  satisfied  rule  having  the  solution  as  its  conclusion  and 
to  be  rejected  if  all  rules  leading  to  it  have  failed.  The 
system  stops  its  operation  either  if  a  confirmed  hypothe- 
sis has  been  found  or  all  generated  hypotheses  have  been 
tested  and  rejected. 


3.2.  Case  Base  Formation 

To  avoid  the  problem  with  the  absence  of  training  examples 
the  CBF  module  treats  the  rules  as  generalized  "typical" 
cases  describing  the  domain  concepts.  Each  rule  is  inter- 
preted as  a  case  represented  by  a  list  of  features  extracted 
from  the  rule  conditions.  The  rule  conclusion  defines  the 
case  classification.  The  typicality  of  a  case  is  defined  as 
its  family  resemblance  and  measured  by  the  ratio  of  its 
intra-concept  similarity  to  its  inter-concept  similarity.  The 
intra-concept  similarity  of  a  case  is  its  average  similarity  to 
other  members  of  the  same  concept  and  the  inter-concept 
similarity  -  its  average  similarity  to  members  of  all  others 
concepts.  The  similarity  s(X,Y)  between  two  cases  with 
known  classifications  is  defined  as  the  inverse  of  the  dis- 
tance between  them  [18]:  s(X,Y)  =  1  —  S(X,Y),  where 
6(X,  Y)  is  computed  as  the  normalized  Euclidean  distance 
between  the  corresponding  cases: 


S(X,Y)  = 


\ 


xj  (yj)  is  the  value  of  j-th  attribute  of  case  X  (Y),  k  — 
\AX  U  Ay\,  Az  (z  =  x,y)  -  the  set  of  attributes  of  the 
corresponding  case.  Weight  Wj  denotes  the  importance  of 
j-th  attribute  of  the  case  and  is  calculated  as  the  ratio  of 
the  number  of  all  case  containing  this  attribute  to  the  whole 
number  of  all  cases  (rules)  in  the  rule  base,  xj  —  y3 :  =  1 
if  Xj  ^  j/j  and  xj  —  yj  =  0  otherwise.  For  missing  values 
(xj  —  yj)2  =  ■£-*(!  —  rr),  where  Lj  is  the  number  of 
possible  values  of  j-th  attribute. 


'See  [2]  for  more  detailed  description  of  the  task. 


A  case  is  indexed  by  all  possible  roles  it  may  play  in  the 
process  of  rule-based  problem  solving.  Each  solved  case  is 
indexed  as  true  negative  (TN)  by  each  hypothesis  rejected 
during  problem  solving  and  as  true  positive  (TP)  by  the 
solution  found  along  with  the  name  of  the  rule  inferred 
this  solution. 

TP-  and  TN-indexes  are  used  for  indexing  cases  which 
have  been  successfully  solved  by  the  RBR  system,  i.e.  when 
the  user  has  confirmed  their  solutions.  A  case  is  indexed 
as  false  positive  (FP)  by  the  rule  if  it  satisfies  the  rule  but 
the  real  case  solution  differs  from  the  inferred  one.  A  case 
is  indexed  as  false  negative  with  respect  to  (w.r.t.)  a  given 
concept  which  is  the  real  solution  of  the  case  if  it  has  been 
tested  and  rejected  by  the  concept  rules. 

A  failed  case  is  indexed  as  untested  w.r.t.  a  given  concept 
-  its  real  solution  -  if  the  solution  has  not  been  tested  by 
the  concept  rules. 

3-4-  Case  Matching 

The  distance  between  a  stored  case  X  and  a  new  problem 
Y  is  defined  as:  A{X,  Y)  =  Wx  *  6{X,Y)  ,  where  S{X,  Y) 
is  the  distance  measure  described  in  the  previous  section 
and  Wx  is  the  weight  of  X .  The  weight  of  a  stored  case  is 
simply  reciprocal  of  its  typicality.  The  matching  procedure 
is  organized  as  a  nearest  neighbor  algorithm  in  which  a  new 
case  is  classified  according  to  the  class  of  the  best  matched 
case  -  the  case  with  the  minimum  value  of  distance  A. 
In  the  case  of  ties  the  most  typical  case  (i.e.  with  the 
minimum  value  of  weight  W)  is  preferred. 

When  measuring  the  similarity  between  an  example  with 
unknown  classification  and  a  case  belonging  to  a  known  cat- 
egory, it  is  naturally  to  consider  only  such  features  of  this 
example  which  are  relevant  to  the  case  class.  For  each  cate- 
gory the  corresponding  set  of  relevant  attributes  is  defined 
as  the  union  of  all  attributes  which  the  rules  corresponding 
to  this  category  refer  to.  In  such  a  way  we  avoid  the  influ- 
ence of  any  redundant  (for  a  particular  category)  features. 

3.5.  Storing  New  Cases 

All  cases  erroneously  solved  by  the  system  are  stored. 
Such  cases  are  indexed  as  false  positive  or  true  negative 
w.r.t.  the  faulty  solution  and  false  negative,  true  positive 
or  untested  (depending  on  the  results  of  the  failure  anal- 
ysis made  by  the  CB-reasoner)  w.r.t.  the  real  solution  of 
the  problem.  The  solved  case  is  also  indexed  as  true  nega- 
tive w.r.t.  all  hypotheses  tested  and  rejected  by  the  system 
during  problem  solving  session. 

A  case  successfully  solved  by  the  system  is  stored  only  if 
its  solution  has  been  found  as  a  result  of  reconciling  a  con- 
flict between  the  paradigms  and  an  identical  case  has  not 
been  stored.  Such  cases  confirm  the  correctness  of  applying 
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the  concrete  rule  or  using  the  concrete  case  to  obtain  the 
problem  solution.  For  each  newly  stored  case  the  value  of 
weight  W  reflecting  the  case  typicality  is  calculated. 

3.6.  Selection  of  Casts  to  be  Retrieved 

The  set  of  cases  to  be  retrieved  is  determined  by  compar- 
ing the  main  characteristics  of  the  current  situation  (the 
rule  which  has  inferred  the  problem  solution  and  the  set  of 
rejected  hypotheses)  with  indexes  connecting  stored  cases 
rules.  The  retrieval  set  is  formed  not  only  by  the  excep- 
tional cases  rejecting  the  particular  rule  but  also  by  the 
cases  confirming  it.  Since  the  set  of  such  true  positive  cases 
may  be  empty  the  case  representing  the  rule  itself  is  also 
retrieved. 

In  situations  when  no  KBS  solution  has  been  found  all 
generalized  cases  associated  with  the  rejected  hypotheses 
are  retrieved  along  with  all  solved  cases  uncovered  by  the 
corresponding  rules.  In  both  situations  the  final  solution  is 
determined  by  the  best  matched  case. 

3.7.  Empirical  Evaluation 

The  approach  was  implemented  in  the  experimental  ILA 
system  CoRCase  (Correcting  Rules  by  Cases)  and  tested 
on  two  medical  domains  -  prognosis  of  breast  cancer  re- 
currence (BC)  and  location  of  primary  tumor  (PT)  -  well 
known  in  ML  community  benchmark  data  bases2  (see  Ta- 
ble 1). 


Domain 

Examples 

Classes 

Attrs 

Vals/Attr 

BC 

286 

2 

9 

5.8 

PT 

339 

22 

17 

2.2 

Table  1.  Main  characteristics  the  databases  used  in  empirical 
evaluation. 


For  the  evaluation  of  the  results  the  random  sub-sampling 
strategy  was  used  [17].  Each  database  was  randomly  split 
into  two  non-overlapping  subsets,  one  for  training  (70% 
of  examples)  and  one  for  testing  (30%  of  examples).  The 
experiments  were  repeated  ten  times  for  different  splits  and 
the  results  were  averaged. 

The  training  sets  were  used  to  induce  corresponding  sets 
of  rules.  Since  we  wanted  to  prove  applicability  of  the 
approach  to  the  expert  rules  with  unknown  principles  of 
building,  two  algorithms  of  different  types  were  used  to 
simulate  the  rules.  The  first  one  was  an  ID3-like  algo- 
rithm inducing  discriminating  rules  and  the  second  -  an 
AQ-type  one  producing  covering  rules.  The  CBF  module 
transformed  each  rule  set  into  generalized  cases  which  ware 
then  tested  on  the  corresponding  set  of  testing  examples. 

To  evaluate  the  contribution  of  CBR  and  RBR  to  the 
final  classification  accuracy  of  the  system  four  different 
problem-solving  algorithms  were  tested.     The  first  one 

2The  data  was  prepared  by  M.  Zwitter  and  M.  Soklic  from  the 
University  Medical  Center,  Institute  of  Oncology,  Ljubljana,  Slovenia. 


was  pure  RBR.  In  the  second  algorithm  the  solution  was 
searched  by  matching  a  problem  at  hand  against  the  gener- 
alized cases  produced  after  the  rule  transformation.  In  this 
algorithm  (named  TC-search)  no  testing  cases  had  been 
stored.  The  third  algorithm  was  an  incremental  extension 
of  the  second  one.  In  this  algorithm  each  solved  case  had 
been  stored  and  then  used  for  searching  solution  of  the  next 
problem.  The  algorithm  may  be  seen  as  a  naive  CBR  with 
exhaustive  search  in  the  case  space.  And  the  last  algorithm 
was  an  implementation  of  the  method  for  integrating  RBR 
and  CBR  described  in  this  section.  The  average  accuracy 
and  standard  deviations  obtained  are  presented  in  Table 
23. 

The  results  of  the  experiments  show  that  the  best  accu- 
racy is  achieved  by  the  proposed  scheme  for  combination  of 
rules  and  cases.  It  is  particularly  interesting  that  on  the  PT 
database  both  the  TC-search  and  the  naive  CBR  methods 
used  separately  have  worse  performance  than  RBR.  It  once 
again  proves  the  effectiveness  of  the  used  indexing  scheme 
used  which  allows  to  retrieve  for  matching  comparatively 
relevant  cases. 


Domain 

RBR  (AQ) 

TC-search 

CBR 

CoRCase 

BC 
PT 

62  0  ±  0.6 
34  7±  4.2 

70.2  ±  2.3 
33  5  ±  1.8 

70.6  ±  1.4 
33.1  ±  2.0 

71.3  ±  3.0 
37. 0±  1.6 

Table  2.  Classification  accuracy  (%)  of  the  tested  problem-solving 

methods. 


4    CASE-BASED  REFINEMENT 
OF  NEURAL  NETWORKS 

The  second  application  of  the  proposed  approach  for  build- 
ing ILA  was  to  the  problem  of  case-based  refinement  of 
neural  networks  (NN)4.  Our  main  goal  was  to  study  the 
potential  of  CBR  for  further  improvement  of  a  trained  NN. 
That  is  why  in  that  application  only  a  part  of  the  CB- 
reasoner  architecture  was  used  (Analyzer,  Retriever  and 
Modifier). 

4-1.  Knowledge  Components 

Training  examples  used  for  training  NN  is  a  main  source  of 
knowledge.  The  same  training  set  along  with  information 
how  the  examples  were  solved  by  NN  after  completion  of 
its  training  phase  were  used  as  an  implicit  problem- solving 
model  of  the  network. 

4-2.  Indexing  Scheme 

A  case  is  represented  by  a  list  of  (normalized)  attribute- 
value  pairs  and  its  real  solution  (classification).  Each  case 

3 See  [3]  for  more  detailed  description  and  evaluation  of  the  exper- 
imental results. 

4  Initial  results  are  described  in  [4] 
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is  indexed  both  by  its  real  and  NN  solutions.  If  an  ex- 
ample has  been  correctly  classified,  it  is  considered  as  a 
"typical"  case  confirming  the  NN  solution,  otherwise  -  as 
an  "exceptional"  one. 

4-3.  Case  Base  Formation 

In  our  previous  research  [4]  the  process  of  case  base  for- 
mation was  implemented  as  a  two  steps  procedure  trying 
to  identify  that  part  of  "noisy"  (from  NN  point  of  view) 
training  examples  which  might  be  used  by  a  CBR  method. 
However,  the  analysis  of  the  initial  experiments  with  the 
system  has  shown  the  low  effectiveness  of  this  procedure 
so  now  all  training  examples  unsolved  by  NN  are  stored  as 
useful  cases. 

The  case  matching  process  is  implemented  as  a  Near- 
est Neighbor  algorithm  which  uses  the  weighted  Euclidean 
distance5: 


glass  (GL),  diabetes  (DB),  breast  cancer  (BC)  6  and  iris 
(IR)[13]  (see  Table  3). 


A(X,Y)  = 


f  77U      -  iiihi, 


where  n  is  the  number  of  attributes  describing  a  case, 
Xi  and  j/,-  stand  for  the  values  of  z-th  attribute  for  cases  X 
and  Y ,  and  max,  and  mm,-  -  for  the  maximal  and  minimal 
values  of  z'-th  attribute. 

The  attribute  weights  w,  are  calculated  by  the  CBF  pro- 
cedure using  Relief F  algorithm  [11]. 

4-4-  Failure  Prediction  and  Case  Retrieval 

The  process  of  prediction  is  based  on  comparing  the  NN 
solution  to  the  set  of  failure  predictors  formed  after  con- 
struction of  the  case  base.  The  roles  of  such  predictors 
are  played  by  the  erroneous  NN  solutions  of  the  training 
examples  included  into  the  case  base. 

In  all  "suspected"  situations  the  final  system  solution  is 
found  by  means  of  CBR  method  that  matches  the  current 
situation  against  the  past  cases  which  are  not  only  excep- 
tional but  also  typical  w.r.t.  the  NN  solution. 

The  formation  of  the  case  retrieval  set  is  organized  as  a 
two  steps  procedure.  During  the  first  step  the  exceptional 
case  which  is  the  most  similar  to  the  current  problem  is 
found.  Then  k  nearest  to  this  case  neighbors  confirming  NN 
classification  are  retrieved.  The  final  solution  is  searched 
among  these  k  +  1  cases. 

4-5.  Empirical  Evaluation 

The  approach  was  implemented  in  an  experimental  ILA 
system  CorNCase2  (Correction  Network  by  Cases  -  ver- 
sion 2).  The  system  was  tested  on  four  well-known  ML 
benchmark  databases  described  by  numerical  attributes  - 


Domain 

Examples 

Atts 

Classes 

Missing  (%) 

BC 

699 

10 

2 

2.2 

DB 

145 

5 

3 

0.0 

GL 

214 

9 

6 

0.0 

IR 

150 

4 

3 

0  0 

Table  3.  Main  characteristics  of  the  databases  used  in  the  empirical 

evaluation'. 


Each  database  was  randomly  split  into  two  non- 
overlapping  subsets,  one  for  training  (70%  of  examples) 
and  one  for  testing  (30%  of  examples).  The  experiments 
were  repeated  ten  times  for  different  splits  and  the  results 
were  averaged. 

The  TB-RBF  system  [12]  was  selected  as  an  instance  of 
NN.  TB-RBF  uses  domain  knowledge  in  the  form  of  deci- 
sion trees  (or  rules)  to  define  the  topology  of  a  radial  basis 
function  network.  Each  class  is  mapped  to  an  output  unit, 
each  attribute  -  to  an  input  unit  and  hidden  units  corre- 
spond to  the  branches  of  the  tree.  To  evaluate  the  CorN- 
Case2  behavior  we  also  tested  weighted  (ReliefF-INN)  and 
unweighted  variants  of  the  Nearest  Neighbor  (INN)  algo- 
rithm working  on  the  TB-RBF  training  set.  The  average 
accuracy  and  standard  deviations  obtained  are  presented 
in  Table  4. 


Domain 

TBRBF 

INN 

ReliefF-INN 

CorNCase2 

BC 

95.5  ±  1.2 

95.6  ±  1.1 

95.8  ±  1.2 

96. 3±  1.2 

DB 

91. 9±  4.1 

93.9  ±  4.4 

94.4  ±  3.3 

92. 3±  4.0 

GL 

70.2  ±  4.6 

69.5  ±  6.5 

70.6  ±  6.2 

75.0  ±  6.8 

IR 

97.1  ±  2.3 

95  .3  ±  1.9 

96.4  ±  2.1 

97. 3±  1.7 

Table  4.  Classification  accuracy  (%)  on  the  test  data  bases  used  in  the 

experiments. 

It  can  be  seen  that  CorNCase2  outperforms  TB-RBF  in 
all  databases.  Moreover,  in  3  of  them  (BC,  GL  and  LD)  the 
proposed  integrated  method  outperforms  both  problem- 
solving  paradigms  used  in  the  integration.  These  results 
prove  the  effectiveness  of  the  proposed  scheme  for  indexing 
and  retrieving  cases. 

It  should  be  mentioned  that  the  value  of  parameter  A;  (the 
number  of  typical  nearest  neighbors  for  each  exceptional 
case)  used  in  the  experiments  was  set  to  30  (for  DB,  GL 
and  IR  the  same  classification  accuracy  was  achieved  when 
k  =  15).  That  means  that  it  is  not  necessary  to  store 
the  whole  training  set  -  we  need  only  those  cases  which  are 
members  of  such  lists  of  neighbors.  As  a  result  CoRNCase2 
uses  in  testing  phase  about  80%  of  training  examples  for 
DB,  72%  -  for  GL,  60%  -  for  IR  and  only  36%  -  for  BC. 
This  shows  that  the  selection  of  the  proper  value  for  k  (e.g. 
by  applying  some  cross-validation  techniques)  may  lead  to 
significant  data  compression. 


5 Only  numerical  attributes  are  used. 


6  Missing  values  were  replaced  with  the  most  frequently  occurring 
values  for  the  respected  attributes. 
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5  CONCLUSION 

In  this  paper  we  propose  an  approach  to  building  an  inte- 
grated learning  architecture  in  which  case-base  reasoning 
is  used  both  as  a  corrective  of  solutions  inferred  by  other 
problem-solving  paradigms  and  as  a  method  for  accumu- 
lating and  refining  domain  knowledge.  The  architecture  is 
intended  to  solve  the  classification  task.  The  applications 
of  the  approach  for  case-based  maintenance  of  rule-based 
systems  and  for  case-based  refinement  of  neural  networks 
are  briefly  described.  The  different  aspects  of  the  archi- 
tecture have  been  tested  on  several  well-known  machine 
learning  databases.  The  experimental  results  prove  that 
the  approach  is  an  encouraging  step  towards  creating  the 
integrated  learning  architectures. 
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Abstract 

Computational  Embodiment  is  the  computer  implemen- 
tation of  principles  that  we  believe  will  lead  to  more  au- 
tonomous and  self-generating  behaviors  that  will  allow 
software  systems  to  exist  in  and  interact  with  complex 
environments.  We  restrict  our  attention  here  to  symbolic 
environments  (MUDs),  as  an  initial  step  towards  under- 
standing and  constructing  "interaction  spaces"  in  which 
humans  and  computer  programs  can  interact  on  an  equal 
footing. 

Our  approach  to  constructing  autonomous  software  sys- 
tems is  based  on  theoretical  work  on  the  structures  that 
underly  language  and  movement  in  biological  systems  and 
on  the  structure  of  constructed  complex  systems  mediated 
or  integrated  by  software. 

This  paper  describes  how  our  "Wrapping"  approach  to 
software  integration,  which  is  a  computationally  reflec- 
tive dynamic  integration  infrastructure,  can  implement 
autonomous  systems.  The  wrapping  approach  supports 
autonomy  by  providing  at  least  primitive  versions  of  the 
functions  that  are  necessary,  and  by  providing  an  infras- 
tructure that  makes  changing  those  functions  easy. 

1  Introduction 

In  this  paper  and  its  companion  [4],  we  describe  an  ap- 
proach to  constructing  autonomous  agents  that  is  based 
on  theoretical  work  on  the  organization  of  structures  that 
underly  language  and  movement  and  on  the  structure  of 
constructed  complex  systems  mediated  or  integrated  by 
software. 

We  are  constructing  autonomous  software  agents  in  sym- 
bolic environments  called  Multi-User  Domains  (MUDs) 


[18]  [19]  [11],  using  a  style  of  "computational  embodi- 
ment" ,  which  is  part  of  our  research  program  on  inter- 
action spaces  [15]. 

In  our  view,  a  system  is  "autonomous"  if  it  can  be  said  to 
have  "purposeful  behavior",  e.g.,  act  independently  based 
on  internally  generated  intentions  [5].  There  are  really 
only  two  classes  of  (difficult)  requirements  for  effective 
autonomy:  robustness  and  timeliness.  Robustness  means 
graceful  degradation  in  increasingly  hostile  environments, 
as  well  as  an  ability  to  exploit  unexpected  advantages  in 
the  environment,  which  to  us  implies  a  requirement  for 
adaptability.  Timeliness  means  that  situations  are  recog- 
nized "well  enough"  and  "soon  enough" ,  and  that  "good 
enough"  actions  are  taken  "soon  enough".  There  is  not 
necessarily  any  optimization  here. 

In  this  paper,  we  describe  our  notion  of  wrappings  as  dy- 
namic infrastructure  (Section  2).  Then  we  turn  to  a  de- 
scription of  the  architecture  we  use  (Section  3),  and  show 
how  wrappings  are  used  to  implement  the  architecture,  so 
we  can  study  some  fundamental  theory  of  computational 
embodiment  from  the  companion  paper  [4].  Finally,  we 
discuss  our  prospects  for  the  future  (Section  4). 

2    Wrapping  Background 

Our  research  in  complex  systems  has  shown  the  impor- 
tance of  infrastructure,  that  is,  explicit  components  and 
activities  of  the  system  whose  function  is  to  help  orga- 
nize the  rest  [16].  No  matter  what  kinds  of  computa- 
tional models  are  used,  the  system  will  need  infrastruc- 
ture which  supports  complex  interactions.  Our  "wrap- 
ping" approach  to  intelligent  integration  infrastructure  for 
constructed  complex  systems  provides  a  natural  means  for 
incorporating  adaptation  and  other  processes  as  computa- 
tional resources.  It  is  based  on  automatic  processing  of  ex- 
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plicit  qualitative  information  about  all  of  the  system  com- 
ponents and  their  interconnection  architecture.  Its  advan- 
tages are  (1)  a  simplifying  uniformity  of  description  using 
the  meta-knowledge  organized  into  Wrapping  Knowledge 
Bases,  and  (2)  a  corresponding  simplifying  uniformity  of 
processing  that  meta-knowledge  using  algorithms  called 
Problem  Managers,  which  are  active  integration  processes 
that  use  the  meta-knowledge  to  organize  the  system's  com- 
putational resources  in  response  to  problems  posed  to  it 
by  users  (who  can  be  either  computing  systems  or  hu- 
mans). In  particular,  since  the  entire  process  is  recursive 
[13]  [14],  wrappings  provide  a  general  way  to  allow  spe- 
cialized methods  to  participate. 

We  believe  that  effective  use  of  any  sufficiently  complex 
system  requires  the  system  itself  to  provide  many  different 
kinds  of  assistance  to  its  users,  whether  they  are  humans  or 
other  computer  systems.  The  system  must  provide  what 
we  have  called  the  Intelligent  User  Support  (IUS)  func- 
tions: 

•  Selection  (which  resources  can  be  applied  to  a  partic- 
ular problem), 

•  Assembly  (how  to  let  them  work  together), 

•  Integration  (when  and  why  they  should  work  to- 
gether), 

•  Adaptation  (how  to  adjust  them  to  work  on  certain 
kinds  of  problems),  and 

•  Explanation  (why  certain  resources  were  or  will  be 
used  or  not  used). 

2.1    Wrapping  Overview 

The  wrapping  theory  has  four  basic  features: 

1.  ALL  parts  of  a  system  architecture  are  resources,  in- 
cluding programs,  data,  user  interfaces,  and  every- 
thing else; 

2.  ALL  activities  in  the  system  are  problem  study,  (i.e., 
apply  a  resource  to  a  posed  problem),  including  user  in- 
teractions, information  requests  and  announcements 
within  the  system,  and  service  or  processing  requests, 
etc.; 

3.  Wrapping  Knowledge  Bases  (WKBs)  contain  wrap- 
pings, which  are  explicit  machine-processable  descrip- 
tions of  all  of  the  resources  and  how  they  can  be  ap- 
plied to  problems  to  support  the  five  IUS  functions 
above; 


4.  Problem  Managers  (PMs),  including  the  Study  Man- 
agers (SMs)  and  the  Coordination  Manager  (CM), 
are  algorithms  that  process  those  wrappings  to  col- 
lect and  select  resources  to  apply  to  problems.  They 
are  also  resources,  so  they  are  also  wrapped. 

The  wrapping  information  and  processes  form  expert  in- 
terfaces to  all  of  the  different  ways  to  use  resources  in  a 
heterogeneous  system  that  are  known  to  the  system. 

2.2  Wrapping  Information 

The  main  information  entities  are  Wrapping  Knowledge 
Bases  (WKBs).  The  WKBs  contain  explicit,  machine- 
readable  descriptions  of  each  computational  or  informa- 
tion processing  element  (called  a  resource)  in  the  system: 
not  just  how  to  use  it,  but  also  whether  and  when  and  why 
and  in  what  kinds  of  combinations  it  should  or  can  be  used. 
The  entries  in  the  WKB  are  called  "wrappings".  Wrap- 
pings have  a  variable  syntax,  according  to  our  "variables 
all  the  way  down"  principle,  but  there  are  some  common 
features  in  the  semantics  [12],  which  we  describe  briefly 
next. 

Each  wrapping  is  a  list  of  "problem  interpretation"  en- 
tries, each  of  which  describes  one  way  in  which  this  re- 
source can  be  used  to  deal  with  a  problem.  There  may  be 
several  problem  interpretation  entries  for  the  same  prob- 
lem if  the  resource  has  many  different  ways  of  dealing  with 
it.  Each  of  these  problem  interpretation  entries  has  lists 
of  conditions  that  must  hold  for  the  resource  to  be  consid- 
ered or  applied.  These  sets  of  conditions  are  important  at 
different  times;  one  at  resource  consideration  or  planning 
time,  and  the  other  at  resource  application  time.  These 
conditions  range  from  data  type  and  interaction  protocol 
statements  (for  "how"  to  apply  the  resource)  to  qualitative 
knowledge  about  the  context  under  which  it  is  appropri- 
ate to  apply  the  resource.  These  act  as  pre-conditions  for 
the  application  of  the  resource.  The  post-condition  is  the 
"product"  list  of  context  component  assignments,  which 
describe  what  information  or  services  this  resource  makes 
available  when  it  is  applied. 

2.3  Wrapping  Processes 

The  wrapping  processes  are  active  coordination  processes 
that  use  the  wrappings  for  the  Intelligent  User  Support 
functions  [12]  [13].  They  also  provide  overview  via  per- 
spective and  navigation  tools,  context  maintenance  func- 
tions, monitors,  and  other  explicit  infrastructure  activi- 
ties. Since  they  are  also  wrapped,  the  entire  system  is 
Computationally  Reflective  [17]  [10]  [13].  It  is  this  ability 
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of  the  system  to  analyze  its  own  behavior  that  provides 
some  of  the  power  and  flexibility  of  resource  use. 

2.3.1  Coordination  Manager 

The  alternation  between  problem  definition  and  problem 
study,  and  the  determination  of  an  appropriate  context  of 
study,  are  organized  by  the  Coordination  Manager  (CM), 
which  is  one  of  the  resources  that  coordinates  the  wrap- 
ping processes.  The  basic  problem  study  sequence  is  man- 
aged by  a  kind  of  resource  called  the  Study  Manager  (SM), 
which  organizes  problem  study  into  a  sequence  of  basic 
steps  that  we  believe  represent  a  fundamental  part  of  prob- 
lem solving. 

The  CM  runs  a  sequence  of  steps  that  manages  the  overall 
system  behavior: 

Find  context  :  get  a  containing  context  from  the  user 
loop: 

Pose  problem  :  determine  the  current  problem 
Study  problem  :  use  an  SM  to  address  it 
Present  result  :  to  user 

Each  step  is  a  problem  posed  to  the  system  by  the  CM, 
which  then  uses  the  default  SM  to  manage  the  system's 
response  to  the  problem. 

The  main  purpose  of  the  CM  is  cycling  through  the  other 
three  problems,  which  are  posed  by  the  CM  in  the  context 
found  by  the  first  step.  This  way  of  providing  context 
and  tasking  for  the  SM  is  familiar  from  many  interactive 
programming  environments:  the  "Find  context"  part  is 
usually  left  implicit,  and  the  rest  is  exactly  analogous  to 
LISP's  "read-eval-print"  loop,  though  with  rather  different 
processing  at  each  step,  mediated  by  one  of  the  SMs.  In 
this  sense,  the  CM  steps  represent  the  basic  heartbeat  of 
the  system. 

2.3.2  Study  Manager 

The  Study  Manager  is  the  central  algorithm  of  our  prob- 
lem study  strategy.  It  is  the  default  resource  for  the  prob- 
lem "Study  problem".  Its  purpose  is  to  organize  the  re- 
sources that  process  the  wrappings,  and  to  cause  and  mon- 
itor the  behaviors  the  wrappings  describe.  There  may  be 
other  resources  that  are  intended  for  this  same  problem, 
but  the  SM  is  the  one  that  is  chosen  if  no  other  resource 
applies.  This  overlap  of  function  illustrates  a  general  prin- 
ciple of  resource  coordination  we  have  used  throughout: 
instead  of  trying  to  find  one  general  method  for  all  cases 
(which  we  do  not  believe  is  possible  anyway),  we  combine 


general  methods  for  certain  processes  with  powerful  spe- 
cialized methods  that  apply  in  certain  contexts. 

The  SM  process  begins  with  a  problem  poser,  a  problem 
defined  by  its  name  and  associated  data,  and  the  context 
in  which  the  problem  was  originally  posed. 

We  have  divided  the  "Study  problem"  process  into  three 
main  steps:  "Interpret  problem",  which  means  to  find  a 
resource  to  apply  to  the  problem;  "Apply  resource",  which 
means  to  apply  the  resource  to  the  problem  in  the  current 
context;  and  "Assess  results" ,  which  means  to  evaluate  the 
result  of  applying  the  resource,  and  possibly  posing  new 
problems.  We  further  subdivide  problem  interpretation 
into  five  steps: 

Interpret  problem  : 

Match  resources  :  get  list  of  candidates 
Resolve  resources  :  reduce  list,  make  bindings 
Select  resource  :  choose  one  resource  to  apply 
Adapt  resource  :  finish  the  bindings 
Advise  poser  :  describe  bindings  chosen 

Apply  resource  :  go  do  it 

Assess  results  :  evaluate 

2.3.3     SM  Recursion 

Up  to  this  point  in  the  description,  the  SM  is  just  a 
(very)  simple  type  of  planning  algorithm.  Several  addi- 
tional design  features  make  it  a  framework  for  something 
more.  First,  all  of  the  wrapping  processes  are  themselves 
wrapped.  Second,  the  processing  is  completely  recursive. 
All  of  the  steps  listed  above  for  the  SM  are  also  posed 
problems.  Third,  there  are  other  SMs  that  have  slightly 
more  interesting  algorithms  (such  as  looping  through  all 
the  candidate  resources  to  find  one  that  succeeds,  as  de- 
scribed in  the  next  section).  These  three  things  mean 
that  every  new  planning  idea  that  applies  to  a  particular 
problem  domain  (which  information  would  be  part  of  the 
context)  can  be  written  as  a  resource  that  is  selectable 
according  to  context;  it  also  means  that  every  new  mech- 
anism we  find  for  adaptation  or  every  specialization  we 
have  for  application  can  be  implemented  as  a  separate 
resource  and  selected  at  an  appropriate  time.  It  is  this  re- 
cursion that  leads  to  the  power  of  wrapping,  allowing  basic 
problem  study  algorithms  to  be  dynamically  selected  and 
applied  according  to  the  problem  at  hand  and  the  context 
of  its  consideration. 

This  recursion  has  profound  implications  in  other  applica- 
tions of  wrapping  [14],  but  for  its  application  to  software 
development,  the  fact  that  the  SM  can  be  selected  is  a  key. 
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Since  the  SM  steps  above  that  interact  with  the  Wrapping 
Knowledge  Bases  (WKB)  are  themselves  posed  problems, 
we  can  use  completely  different  syntax  and  largely  differ- 
ent semantics  for  different  parts  of  the  WKB  in  different 
contexts,  and  select  the  appropriate  processing  algorithms 
according  to  context.  In  particular,  whether  one  writes 
about  one  WKB  or  several  is  a  matter  of  taste  and  view- 
point. Finally,  the  SM  is  only  one  of  a  family  of  processes 
called  Problem  Managers  (PMs),  each  of  which  is  wrapped 
and  selectable  according  to  context,  and  each  of  which  can 
pose  problems  and  organize  their  study  in  different  ways. 

3    Implementation  Issues 

In  this  section,  we  first  describe  the  MUD  environment 
with  which  the  agent  must  interact,  and  then  the  architec- 
ture of  the  experimental  system  we  intend  to  use  for  our 
autonomous  agent,  based  on  wrapping.  Multi-User  Do- 
mains (MUDs)  are  an  interesting  new  kind  of  groupware 
that  incorporates  people  into  the  program.  MUDs  have 
become  enormously  popular  as  games  and  as  educational 
support  tools  over  the  last  few  years  because  they  get  the 
human  interactions  right  in  some  fundamental  sense,  and 
because  they  engage  our  sense  of  "place"  [18]  [19]  [11]  [15]. 
MUD  clients  and  servers  are  easy  to  obtain  and  run  (most 
servers  and  clients  are  free),  but  they  usually  only  pro- 
vide text  worlds;  there  is  little  interaction  with  existing 
tools  that  are  outside  the  MUD;  though  some  have  con- 
struction languages  that  allow  complex  programming,  it 
is  the  usual  kind  of  programming;  and  it  is  not  very  easy 
to  access  large  volumes  of  information. 

3.1    MUD  Architecture 

We  start  with  a  description  of  MUD  architecture  [15], 
because  that  characterizes  the  environment  in  which  the 
agent  needs  to  act.  We  also  characterize  the  MUD  en- 
vironment via  the  important  styles  of  interaction,  so  that 
the  agent  can  distribute  incoming  interaction  items  appro- 
priately. 

The  MUD  environment  is  defined  by  its  architecture: 

•  Connectivity  Management, 

•  Virtual  World,  and 

•  Computational  Infrastructure. 

The  Connectivity  Management  Layer  is  responsible  for  the 
multi-user  capabilities  of  a  MUD.  This  layer  is  responsi- 
ble for  the  information  transmission  in  both  directions  be- 
tween users  (i.e.,  humans  or  computer  programs  that  use 


the  MUD),  and  characters  (i.e.,  objects  within  the  MUD 
that  have  externally  provided  behavior).  It  is  common 
across  many  styles  of  MUD,  and  will  not  be  discussed  fur- 
ther in  this  paper. 

Our  autonomous  agent  interacts  mainly  with  the  Virtual 
World,  using  a  small  separate  layer  for  interaction  with  the 
Connection  Management  part  of  the  MUD.  The  Virtual 
World  consists  of  locations  (called  "rooms")  and  intercon- 
nections (called  "exits"),  each  of  which  has  a  description. 
We  say  that  an  exit  is  "linked"  to  a  room  if  the  room 
is  its  destination  (all  links  are  one-way).  The  only  kinds 
of  behavior  that  are  allowed  are  issuing  commands  to  the 
server  and  following  links. 

For  our  discussions  of  MUD  architecture,  we  use  one  of 
the  simplest  MUD  servers,  called  TinyMUD  [2]  [18]  [19], 
since  it  has  all  the  essential  technical  features  we  want.  It 
is  not  too  complex  to  extract  and  describe  those  features 
easily,  and  because  it  illustrates  one  of  the  most  impor- 
tant aspects  of  MUDs:  it  is  the  writers  and  artists  that 
define  the  culture  (not  the  technical  substrate),  and  it  is 
the  culture  that  makes  a  MUD  viable  or  not. 

The  important  aspects  of  the  Virtual  World  are: 

•  places  and  connections  between  them, 

•  movements, 

•  objects  and  their  locations, 

•  actions,  and 

•  relationships  among  aspects 

Everything  in  the  system  is  a  model,  which  we  use  to 
represent  objects,  relationships,  and  actions.  Everything 
has  attributes,  one  of  which  is  location.  Connections  are 
maps  from  one  place  to  another.  Movements  are  actions 
that  attempt  to  use  connections  or  connectionless  maps 
(e.g.,  teleport).  Actions  can  change  any  attributes.  This 
is  one  classification  hierarchy  of  everything,  which  is  de- 
rived from  the  ones  common  to  many  MUD  servers  and 
other  Virtual  environments.  It  will  be  mapped  into  a  hi- 
erarchy appropriate  for  understanding  our  experimental 
environment. 

3.2    Architecture  of  Experimental  System 

The  basic  architecture  of  a  wrapping  system  has  three 
layers  [12]  [13]:  the  bottom  two  layers  are  the  Message 
Distribution  Layer,  which  can  be  implemented  with  ex- 
plicit messages  or  in  other  ways,  and  the  Wrapping  Core 
Layer,  which  contains  the  default  Coordination  Manager 
and  Study  Manager,  and  the  default  resources  for  CM  and 
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SM  steps.  Above  that  are  the  wrapping  resources  (other 
SMs  and  PMs),  utilities  (optimizers  and  such),  and  ap- 
plications. These  can  be  thought  of  as  a  single  Resource 
Layer. 

It  follows  that  our  architecture  problem  becomes  a  mat- 
ter of  identifying  appropriate  resources  and  their  wrap- 
pings. We  need  to  describe  the  problems,  resources,  and 
wrappings  for  the  autonomous  agent.  The  architecture 
is  adaptive,  so  that  we  can  insert  new  computational  re- 
sources into  connections  when  they  become  available.  It 
is  also  different  for  different  experiments,  and  there  is  a 
meta-architecture  that  helps  build  new  architectures. 

3.3    Agent  Architecture 

Our  computational  version  of  the  language  and  movement 
theory  [3]  is  that  there  are  layers  of  symbol  systems  that 
separately  normalize  external  signals  into  interesting  in- 
formation spaces,  so  that  the  useful  processing  can  take 
place  in  those  spaces  (this  separation  is  the  analogue  of 
the  behavior  of  different  senses). 

The  agent  runs  a  set  of  steps  that  manages  its  overall 
behavior: 

Login  :  to  remote  server 

concurrent  loop: 

Look  around  :    determine  current  local  environ- 
ment 

Talk  :  to  characters,  local  or  remote 

Move  around  :  to  other  places 

Do  commands  :  choose  some  available  commands 

All  of  these  processes  are  concurrent,  making  suggestions 
about  what  to  do  next,  but  the  agent  usually  doesn't  move 
around  or  do  commands  and  talk  locally  at  the  same  time, 
(these  processes  interfere  with  each  other  because  of  focus 
of  attention). 

Some  of  the  major  low-level  communication  resources  in 
the  agent  have  the  following  tasks  (these  are  problems): 

•  login  to  remote  server, 

•  convert  incoming  character  strings  into  interaction 
items, 

•  convert  outgoing  items  to  characters,  and 

•  distribute  interaction  items  according  to  context  and 
content. 


The  agent  interacts  with  the  Virtual  World  with  other 
tasks: 

•  parse  room  descriptions  and  other  information, 

•  identify  connections,  commands,  and  conversational 
streams, 

•  generate  movement  and  other  actions,  and 

•  generate  conversational  interaction  items. 

We  have  constructed  a  number  of  local  interaction  gram- 
mars, using  CYKPL  (my  variant  of  V.  Pratt's  combina- 
tion of  the  widely  used  Cocke- Younger-Kasami  and  Early 
parsers  [7]  [12]),  but  we  can  use  any  of  the  common  table 
or  chart  parsers  [9],  or  any  other  mechanism  we  choose  [1], 
and  with  that  mechanism  we  can  use  whatever  new  user 
language  we  choose  (of  course,  some  language  choices  are 
better  suited  to  certain  kinds  of  parsing,  and  vice  versa). 

4  Prospects 

Autonomous  systems  must  be  complex  systems,  with  an 
ever-increasing  repertoire  of  possible  behaviors  and  pro- 
cesses for  selecting  them,  fallback  choices  to  account  for 
incorrect  situation  estimation,  and  quick  partial  solutions 
to  reduce  decision  time.  All  activity  is  situated,  strongly 
dependent  on  context,  and  there  need  to  be  different  de- 
cision processes  in  different  situations.  This  flexibility  re- 
quires an  architecture  in  which  many  parts  of  the  system 
are  infrastructure,  organizing  other  parts  of  the  system  to 
identify  and  address  problems,  and  monitoring  their  be- 
havior [8]. 

The  wrapping  approach  provides  several  simplifying  uni- 
formities for  the  infrastructure:  Wrapping  Knowledge 
Bases  uniformly  represent  the  computational  resources 
(uniformity  of  description),  wrapping  processes  uniformly 
treat  system  activity  as  problem  study  (uniformity  of  pro- 
cesses), and  treating  the  wrapping  processes  as  resources 
also  provides  a  computationally  reflective  system,  which 
can  reason  about  its  own  behavior.  The  ability  to  reflect 
on  one's  own  behavior  (and  modify  it)  is  one  of  the  main 
keys  to  effective  autonomy. 

Our  study  of  autonomy  is  part  of  a  larger  research  program 
on  Interaction  spaces  [15],  which  are  computer-mediated 
virtual  places  in  which  computer  programs  and  humans 
meet.  Since  wrappings  allow  many  special  case  processes 
to  be  combined  with  general  case  methods,  and  MUDs 
allow  multiple  programs  to  be  connected  together  with 
multiple  humans  as  users,  all  in  the  same  environment,  we 
have  taken  the  wrapping  architecture  and  applied  it  to  the 
MUD  servers  to  define  these  interaction  spaces. 
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Interaction  spaces  are  a  new  way  for  computer  programs 
to  be  integrated,  by  combining  the  interactions  among  the 
different  kinds  of  users  of  the  space,  be  they  human  or 
computer  programs.  The  idea  is  that  explicit  attention  to 
using  co-experience  as  an  integration  mechanism  can  lead 
to  very  different  styles  of  interaction  among  computational 
agents  and  humans,  and  therefore  possibly  much  more  in- 
teresting program  structures  and  behaviors.  This  paper 
is  about  the  integration  infrastructures  and  architectures 
that  support  adaptive  and  reflective  behaviors.  We  believe 
that  using  them,  we  can  build  agents  that  are  much  more 
useful  partners  in  an  increasingly  complex  information  en- 
vironment. 
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ABSTRACT 

Our  natural  world  is  dynamically  composed  in  real  time. 
Despite  frequent  perturbations  and  structural  irregularities, 
substantial  zones  of  stability  are  observable.  The  historical 
emergence  of  such  structurally  stable  zones  from  simpler 
components  is  supported  by  lines  of  evidence  from  many 
disciplines  (2,3,12,10,16).  In  order  to  construct  a  common 
semiotic  lineage  among  the  diverse  symbol  systems  used  by 
various  disciplines,  a  theory  of  organization  was  constructed. 
The  foundations  of  the  theory  are  developed  from  observations 
on  the  nonlinear  dynamics  of  organisms  within  ecoments  - 
ecoments  being  defined  as  the  immediate  surroundings  of 
systems.  Each  system,  (sub-system,  sub-sub-system  ...  )  is 
assigned  four  primitives  attributes  (closure,  conformation, 
concatenation  and  cyclicity).  In  principle,  each  of  these  four 
terms  is  enumerable  for  a  local  system.  Degrees  of  organization 
(symbolized  as  0°)  are  composed  from  lesser  organized  systems 
to  higher  organized  systems  in  terms  of  the  enumeration  of  the 
four  primitives.  The  emergent  organizations  are  enumerated: 
1°,  2°,  3°,  ...  .  The  patterns  of  organization  at  any  particular 
level,  O0,  are  composed  from  patterns  at  other  levels.  Thus,  no 
particular  science  or  philosophy  is  assigned  a  privileged  role  in 
the  unfolding  of  the  dynamics. 

Mathematically,  the  systems  are  composed  under  the  scientific 
representations  of  categories  as  developed  in  Ehresmann  and 
Vanbremeersch,  [10]  and  Chandler,  [4,5,7].  Categorical 
transitions  can  be  viewed  as  homologous  to  Thorn's  four 
archetypal  singularities  [17,  18].  This  notation  parallels 
natural  history  and  allows  the  facile  accounting  of  the 
molecular  biological  mechanisms  within  a  living  system. 
Implications  of  this  theory  of  natural  organization  for  the 
design  of  artificial  hierarchical  systems  are  apparent. 

Key  WOrdS!  Category  theory,  complexity,  hierarchy, 
emergence,  theoretical  biology,  biochemistry,  semiotics, 
modeling,  information  theory,  nonlinear  dynamics, 
structuralism,  organizational  theory,  conceptual  bindings, 
cell,  genetics,  health  and  disease 

1.  SEMIOTICS  OF  REPRESENTATIONS 

How  are  dynamics  of  the  organic  components  organized  to 
form  organisms?  The  semiotic  character  of  the  problem  is 
exemplified  by  noting  that  the  terms  organic,  organism, 
and  organization  are  all  derived  from  the  same  root!  The 
common  semantic  origin  of  these  terms  suggests  the 
existence  of  a  common  perceptual  pattern  underlies  the 


phenomena  of  complex  pattern  generators.  If  such  a 
perceptual  pattern  exists,  then  it  should  be  possible  to 
construct  a  network  of  symbols  and  languages  such  that 
the  representation  of  physical,  chemical,  biochemical, 
cellular,  individual  and  population  processes  are 
meaningfully  interrelated.  For  example,  for  material 
objects,  a  criteria  of  meaning  could  be  a  common 
understanding  of  the  scale  of  an  object,  it  parts,  its  forms 
of  birth  and  death,  and  its  capacities  to  unite  with  or 
separate  from  other  objects  (analogous  to  Thorn's 
archetypal  singularities  (17)).  Meaningful  transdisciplinary 
communication  depends  on  a  mutual  understanding  among 
the  participants,  for  example,  between  the  business 
manager  and  the  mathematician.  To  understand  the 
meaning  of  a  semantic  term  for  a  material  object  implies 
some  knowledge  of  the  relationships  among  the  parts,  the 
whole  and  the  embedding  system  surrounding  the  object 
itself  (6).  This  is  a  major  task  for  a  biological  notation 
because  of  the  unique  sequences  of  macromolecules  which 
generate  the  unique  dynamics  of  each  individual  organism. 
Since  the  role  of  individual  DNA  base  changes  (mutations) 
in  the  genesis  of  specific  disease  states  is  frequently 
observed,  a  useful  notation  must  represent  both  the 
structural  form  of  a  complex  system  as  well  as  its  dynamic 
unfolding.  Causal  pathways  are  intrinsic  to  complex 
behaviors.  A  semiotics  of  complexity  should  include  a 
crisp  representation  of  causality  in  order  to  construct  to  a 
useful  notation.  Analogously,  the  meaning  of  the 
mathematical  models  should  provide  a  basis  for 
predictions.  If  the  scientific  models  of  complex  systems  are 
not  relatable  to  practice,  the  primary  goal  will  not  be 
achieved. 

2.  Structures  for  Scientific  Semiotics 

If  science  is  to  become  a  unified  body  of  knowledge,  a 
congruency  must  be  extensible  from  simple  to  complex 
systems.  Can  the  search  for  congruence  among  what  we 
observe,  what  we  symbolize  and  what  we  describe  be 
viewed  as  an  alternative  expression  of  a  desire  for  scientific 
predictions?  Predictability  for  a  complex  system  requires  a 
congruence  among  observations,  languages  and  symbols, 
as  geometrically  illustrated  in  Figure  1 .  Can  this  figure 
also  be  viewed  as  a  genitive  symbol  which  is  independent 
of  the  scale,  history,  or  behavior  of  any  particular  system 
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and  for  organizing  transdisciplinary  communication?  Are 
these  questions  complementary  to  one  another? 


Observations 


Symbols  Languages 

Figure  1.  Inchoate  symbol  for  a  common  scientific 
semiotics. 

Traditionally,  scientists  have  sought  to  isolate  individual 
objects  or  classes  of  objects  for  study.  Disassembly  of 
natural  systems  into  component  parts  and  analysis  of  the 
individual  units  was  a  successful  strategy  for  the  study  of 
simple  natural  systems.  Such  work  tends  to  accent  the 
independence  of  the  components.  Indeed,  physical  theories 
often  further  accent  the  notion  of  independence  or 
autonomy,  both  in  the  conceptualization  and  the 
mathematical  representation.  The  examples  of  quantum 
mechanics  and  thermodynamics  illustrate  the  critical 
importance  of  the  concept  of  independence  to  scientific 
thought.  However,  emergence  of  a  new  degree  of 
organization  depends  on  interdependence  -  a  special 
cooperation  among  the  components  such  that  a  vertical 
genesis  process  occurs  in  order  to  construct  a  hierarchy. 
For  example,  the  organic  bases  within  a  DNA  sequence  of 
a  genetic  system  are  not  independent,  but  are  tightly  linked 
to  each  other  to  form  a  double  helix.  This  assessment  of 
interdependence  among  objects  was  crucial  to  a  logical 
construction  of  complexity  (the  C*  hypothesis)  in  the 
WESScomm  I,  II,  and  III  papers,  (4,5,7)  where  the 
observed  logical  interdependence  among  the  classes 
explicitly  motivated  the  symbolic  representation.  The 
description  of  complexity  was  built  by  organized  systems 
of  constraints.  Thus,  conceptual  mferdependence  formed 
the  basis  for  the  organization  of  the  C*  hypothesis. 

A  motivation  for  logical  interdependence  comes  from  the 
necessary  and  sufficient  conditions  for  life  (5).  More 
explicitly  stated,  causality  in  biological  systems  requires 
an  interdependence  among  three  degrees  of  organization  - 
the  genetic  system,  the  organism  and  the  ecosystem  as 
illustrated  by  the  following  relations: 

A  genome  is  a  necessary  but  not  a  sufficient  condition 
for  the  existence  of  an  organism. 

An  ecosystem  is  a  necessary  but  not  a  sufficient 
condition  for  the  existence  of  an  organism. 

The  sufficiency  of  a  specific  ecosystem  to  sustain  life 
of  a  specific  organism  depends  on  the  genome. 


The  sufficiency  of  a  specific  genome  to  sustain  life 
depends  on  the  specific  ecosystem. 

Abstractly  stated,  internal  and  external  causes  fit  together 
dynamically  to  create  necessary  and  sufficient  conditions 
for  the  emergence  and  sustenance  of  life's'  patterns. 

A  primary  objective  this  notation  is  express  the  structure 
of  complexity  in  symbolic  terms  such  that  mathematical 
relationships  among  the  patterns  can  be  explored.  Which 
mathematical  structures  are  sufficiently  rich  to  support  a 
representation  of  interdependence,  of  hierarchical  degrees  of 
organization  and  of  nonlinear  dynamics?  Ehresmann  [11] 
noted  an  abstract  freedom  in  creating  mathematical 
structures  and  that  the  "theory  of  categories  seems  to  be  the 
most  unifying  trend  in  present  day  mathematics;...  ." 
Ehresmann  and  Vanbremeersch  (10),  Thorn  (17),  Rosen 
(15),  and  Baas  (1)  have  suggested  the  use  of  category 
theory  in  science.  The  structural  simplicity  of  category 
theory  (objects,  morphisms,  and  compositions)  lends  itself 
to  logical  applications.  It  has  found  wide  application  in 
the  design  of  computer  algorithms  (object  oriented 
programming)  (19).  Ehresmann  and  Vanbremeersch,  in  a 
long  series  of  papers  starting  in  1987,  have  pioneered  the 
use  of  category  theory  to  describe  mental  processes  by 
constructing  categorical  "Memory  Evolutive  Systems."  A 
hierarchical  perspective  on  complexity  (the  C*  hypothesis) 
and  the  Memory  Evolutive  Systems  are  closely  related 
theories  (8).  We  have  published  a  model  of  a  cell  based  on 
category  theory  and  pointed  out  the  critical  role  of  time 
scales  in  decision  making  processes  (17).  Enhanced  graph 
theory,  which  is  somewhat  related  to  category  theory,  is 
widely  used  in  the  computations  of  chemical  structures.  It 
appears  that  augmented  category  and  graph  theories,  along 
with  singularity  theory  will  play  a  substantial  role  in  the 
construction  of  emergent  dynamics  of  complex  systems. 

A  robust,  constructive,  scientific  notation  is  needed  to 
support  the  mathematical  explorations  of  organizations  of 
interdependent  objects. 

3.  Proposal  for  a  robust  notation  for 
organized  systems 

Historically,  a  symbol  is  used  to  denote,  among  other 
possibilities,  a  category  or  object  or  concept  or  belief  or  a 
class  of  categories,  objects,  concepts  or  beliefs. 

Let  0°  be  any  degree  of  organization  of  an  object. 

Here,  a  symbol  of  the  form  0°  is  used  to  denote  a  general 
class  of  organization.  The  natural  question  now  arises: 
How  can  I  create  a  procedure  for  assigning  scientific 
meaning  to  the  natural  sequence  of  symbols?  Since 
science  has  a  number  of  organizational  principles,  a 
selection  is  necessary  to  construct  an  ordering  relationship. 
The  objective  of  the  procedure  is  to  construct  a  notation 
such  that  a  one  to  one  correspondence  between  the 
languages  of  scientific  observations  and  the  symbolic 
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representation  of  the  degree  of  organization  creates  a 
conceptual  basis  for  the  enumeration  of  complexity. 
Therefore,  I  choose  to  select  the  meaning  of  the  symbols 
0°1,  0°2,  0°3,  ...  such  that  a  construction  of  one  to  one 
correspondences  between  natural  numbers  and  material 
objects  is  feasible.  I  have  selected  the  following  specific 
semantic  ordering  for  material  objects  of  a  cell: 

001  subatomic  particles 

002  atoms 
O0^  molecules 

biomacromolecules 
Oo5  cells 
0°*>  ecoment 
(W  environment 

This  ordering  relationship  was  selected  for  living  systems; 
it  may  also  be  sufficient  for  general  chemical  and  physical 
usage,  since  it  uses  the  atomic  table  as  a  guide.  The 
meaning  of  these  symbols  is  given. 

001,  subatomic  particles,  consist  of  three  material 
objects:  protons  (+),  electrons  (-)  and  neutrons. 

002,  atoms,  consists  of  somewhat  over  one  hundred 
unique  objects,  composed  from  the  three  subatomic 
particles.  The  composition  of  atoms  from  particles 
can  be  enumerated  systematically  in  terms  of  the 
natural  numbers,  preserving  the  one  to  one 
correspondence  between  particles  and  particles  bound 
into  atoms. 

O0^,  molecules,  consists  of  a  very  large  number  of 
different  material  objects  composed  from  atoms  or 
ions.  The  binding  operations  which  form  neutral 
molecules  from  atoms  preserve  one  to  one 
correspondences  between  atoms  and  atoms  bound  into 
molecules.  These  binding  operations  concomitantly 
form  the  patterns  of  chemical  structures.  The  patterns 
formed  in  molecules  are  created  from  the  organization 
of  the  quantum  numbers. 

0°4,  biomacromolecules,  consists  of  many  material 
objects  (not  an  infinite  number),  composed  from 
specific  classes  of  molecules.  The  binding  operations 
preserve  one  to  one  correspondences  between 
subgraphs  the  molecules  and  the  bound  subgraphs 
within  the  macromolecules. 

0°5,  cells,  consists  of  living  objects  which  can  be 
represented  as  have  a  boundary  sustained  by  a  genetic 
system.  The  genetic  system  is  composed  of 
components  of  a  consisting  of  Oo1,  subatomic 
particles,  Oo2  atoms,  0°3  molecules,  and  0°4 
biomacromolecules.  The  ordering  relationships 
among  the  components  of  a  cell  are  not  completely 


specified  by  internal  relationships.  Nonetheless, 
orderings  relate  all  the  essential  components  of  a  cell. 

O0^,  ecoment,  consists  of  the  surrounds  of  the  cell.  In 
natural  systems,  the  surround  may  include  Ool> 
subatomic  particles,  Oo2  atoms,  O0^  molecules,  0°^ 
biomacromolecules,  0°5  other  cells  and  potentially 
more  highly  organized  systems.  The  ordering 
relationships  among  the  components  of  an  ecoment 
are  not  readily  specified  for  natural  systems.  However, 
the  minimal  essential  components  of  an  ecoment  are 
known  for  many  cells  and  higher  organisms  -  they  are 
named  "essential  nutrients." 

(W,  environment,  is  the  embedding  system  for  the 
surrounds  of  organisms. 

As  suggested  by  the  ordering  relationships,  the  term 
ecoment  is  introduced  to  describe  the  immediate 
surrounding  of  the  living  organism.  It  is  this  immediate 
surroundings  which  provides  the  nutrients  (the  necessary 
and  sufficient  conditions)  for  sustaining  life.  The  term 
ecoment  implies  a  specific  subset  of  the  environment 
which  is  experienced  by  the  organism. 

A  sequence  of  degrees  of  organization  can  be  designed  for 
other  systems.  These  designs  may  require  either  a  smaller 
number  or  a  larger  number  of  degrees  of  organization.  For 
example,  less  complex  systems  composed  from  mechanical 
and  /  or  electrical  components  would  not  require  as  many 
degrees  of  organization  as  a  living  organism. 

As  noted  above,  the  conceptual  basis  of  this  proposal  for  a 
scientific  semiotics  is  the  recognition  that  three  degrees  of 
organization  are  essential  to  predicting  the  complete 
behavior  of  any  system  and  these  three  0°  must  form  an 
ordering  relationship  (6,7).  Philosophically,  these  three 
degrees  of  organization  can  be  expressed  in  common 
language  as  the  parts,  the  whole  and  the  surroundings  of 
the  whole.  Co-extensive  with  this  language  and  these 
symbols  are  the  scaling  factors  -  the  parts  are  smaller  than 
whole,  the  whole  is  smaller  than  it's  ecoment. 

4.  Language  of  description  of  a  degree  of 
organization 

The  next  step  in  designing  a  scientific  semiotics  for 
accounting  for  complexity  is  to  list  a  common  set  of 
concepts  which  can  be  applied  to  a  specific  system.  Four 
terms  can  be  used  to  describe  the  behavior  of  any  degree  of 
organization  (7).  These  four  terms  are  interdependent  with 
each  other  and  must  be  defined  sequentially. 

Closure:  a  domain  of  discourse,  a  category,  a  system, 
an  object,  a  unity. 
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Conformation:  the  components  of  the  closure  of  a 
system,  the  internal  patterns  of  the  system,  the 
relationship  among  the  parts  of  the  system,  a  three 
dimensional  depiction  of  the  internal  description  of  a 
system,  a  specific  geometric  and  algebraic  description 
of  components  of  the  system. 

Concatenation:  binding  parts  together,  linking 
changes  in  the  conformation,  changes  in  the  internal 
patterns  of  a  system,  the  specific  linkages  between 
parts  of  the  whole,  dynamic  processes  of  the  system 
linking  patterns  to  patterns. 

Cyclicity:  a  pattern  of  concatenations  which  sustains 
the  system,  the  potential  cyclic  walks  or  pathways 
over  the  conformations  of  the  system,  the  habitual 
behaviors  of  the  closure. 

These  four  concepts  serve  as  the  basis  for  a  linguistic 
description  of  simple  and  complex  material  systems.  In 
principle,  each  concept  can  be  applied  to  the  enumeration 
of  each  degree  of  organization.  When  the  material  state  of 
a  system  is  known,  then  these  concepts  provide  a  basis  for 
specifying  specific  objects  and  may  allow  an  accounting  of 
the  complexity.  An  example  illustrating  the  application  of 
these  symbols  and  concepts  to  fermentation  of  wine  was 
reported  (Chandler,  in  press).  The  description  of  the 
fermentation  of  sugars  from  natural  juices  illustrates  the 
interrelationship  between  scientific  languages  and  scientific 
symbols  at  degrees  Oo1, 0°6.  Indeed,  the  organization 
of  entire  internal  process  of  fermentation  can  be 
symbolized  in  terms  of  causal  pathways  among  the  labeled 
asymmetric  psuedographs  and  dynamic  processes 
symbolizing  the  role  of  collaborative  configurations  in 
generating  singularities  and  accelerating  the  flow  of  energy. 
Since  fermentation  has  been  cultivated  by  human  cultures 
since  ancient  times,  these  examples  can  also  be  extended  to 
higher  degrees  of  organization  to  describe  the  role  of 
fermentation  in  the  history  of  science  and  law.  For 
example,  cultural  controls  (laws)  governing  the  purity  and 
wholesomeness  of  beer  were  among  the  earliest  food  safety 
rules. 

5.  Complex  Causal  Pathways 

Examination  of  the  differential  equations  representing 
fermentation  illustrate  that  the  cause  -  effect  relationships 
in  complex  systems  can  be  deterministic,  highly  organized, 
and  extremely  selective.  Of  all  the  possible  singularities 
which  could  occur  among  the  Oo1,  0°2, 0°3, 0°4, 0°5  and 
O0^  components  of  fermentation,  the  selection  of  one 
specific  causal  walk  from  glucose  to  alcohol,  among  the 
virtually  infinite  number  of  potential  chemical 
singularities,  symbolizes  a  degree  of  determinism 
unprecedented  in  the  physical  or  chemical  sciences.  How 
are  we  to  describe  or  symbolize  this  unique  set  of  cause 
effect  relationships?  I  suggest  that  complex  causal 
pathways  be  organized  in  terms  of  the  relative  degrees  of 


organization  being  symbolized  in  the  ordering  relationships 
of  the  languages  of  observations.  This  suggests  four  terms 
for  directional  causality: 

Bottom-up       <  >  Top-down 

Outside-inward  <  >  Inside-outward 

One  description  of  cause  effect  relationships  (in 
thermodynamics,  for  example)  is  based  on  energy  flow. 
This  will  be  call  this  'bottom  -  up'  causality.  Processes  at 
all  degrees  of  organization  involve  action  and  energy  flow 
and  hence  bottom  -  up  causality.  Merely  describing  energy 
flow  fails  to  predict  the  unprecedented  determinism 
observed  in  living  systems.  For  any  particular 
hierarchical  structure,  the  sum  of  the  mass  will  uniformly 
increase  with  the  degree  of  organization  (since  the  mass  of 
the  whole  must  be  greater  than  the  mass  of  a  one 
component  of  the  whole.)  When  one  or  more  higher 
degrees  of  organization  collaborates  with  lower  degrees  of 
organization,  it  can  be  formally  defined  as  top  -  down 
causality.  Thirdly,  examination  of  the  yeast  cell 
fermentation  example  (that  is,  the  material  process  of  a 
brewery)  indicates  the  necessary  role  of  the  ecoment  in 
organized  processes.  The  ecoment  is  driving  the 
conformation  from  outside  the  closure  by  supplying  a 
component  necessary  to  the  cycles.  This  is  named  outside 
-  inward  causality.  Finally,  a  cell  sustains  it  boundary,  its 
internal  structures  and  its  capacity  to  divide  by  responding 
as  a  whole  to  its  ecoment.  It  is  termed  inside  -  outward 
causality  and  is  quasi-conserved  over  generations.  It 
emerges  from  the  organization  of  the  first  three  forms. 
These  four  directions  of  causality  within  a  categorical 
perspective,  are  used  to  describe  the  directions  of  energy 
flows  within  0°  of  a  cell. 

6.  Discussion 

Semiotics  is  the  study  of  meaning.  (The  Greek  root,  sema, 
can  be  translated  as  'mark,  sign.')  How  is  a  scientific 
description  of  natural  structure  related  to  our  common 
semiotics  and  transdisciplinary  communication? 

I  seek  to  place  organized  systems  [0°]  within  the  higher 
framework  by  comparing  the  constrained  scientific  usage 
with  the  more  common  general  usage.  In  "Signs,  an 
introduction  to  semiotics,"  the  American  semiotician,  T. 
Sebeok  [16],  asserts  that  six  species  of  signs  are  distinct. 
He  asserts  that  the  two  parts  of  the  sign  are  the  signifier 
and  the  signified.  With  regard  to  hierarchy,  he  asserts:  "It 
should  be  clearly  understood,  finally,  that  it  is  not  signs 
that  are  actually  being  classified,  but  more  precisely, 
aspects  of  signs:  in  other  words,  a  given  sign  may  -  and 
more  often  and  not  does  -  -  exhibit  more  than  one  aspect, 
so  that  one  must  recognize  differences  in  gradation.  But  it 
is  equally  important  to  grasp  that  the  hierarchic  principle  is 
inherent  in  the  architecture  of  any  species  of  sign. "  The 
contrast  between  a  scientist's  and  a  semiotician's  (Sebeok) 
view  reflects  the  two  cultures.  Next,  Sebeok's'  six  species 
are  contrasted  with  the  scientific  semiotics  developed  here. 
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6.1.1.  SIGNAL 

Se miotic  perspective:  "The  signal  is  a  sign  which 
mechanically  (naturally)  or  conventionally  (artificially) 
triggers  some  reaction  on  the  part  the  receiver." 

0° systems  perspective:  The  signal  is  an  outside  - 
inward  cause  which  pre-supposes  the  existence  of  a 
poised  or  anticipatory  system.  In  the  absence  of 
knowledge  about  the  ratio  of  scales  of  the  objects,  the 
nature  of  the  relationship  between  the  signal  and  the 
poised  system  is  ill-defined.  Signals  may  or  may  not 
trigger  an  observable  reaction,  depending  on  the 
intensity  of  the  signal,  the  degree  of  organization  and 
numerous  other  variables. 

6.1.2.  SYMPTOM 

Semiotic  perspective::  "A  symptom  is  a  compulsive, 
automatic,  non-arbitrary  sign  such  that  the  signifier 
coupled  with  the  signified  in  a  manner  of  a  natural 
link." 

0°  systems  perspective:  A  symptom  can  be  viewed 
as  a  special  (abnormal)  internal  conformation.  It  can 
inform  the  skilled  observer  on  one  specific  category  or 
on  a  collection  of  linkages  of  a  hierarchy  of  categories. 

6.1.3.  ICONIC 

Semiotic  perspective:  "A  sign  is  said  to  be  iconic 
when  there  is  a  topological  similarity  between  a 
signifier  and  its  denotata."  (Sebeok  sites  Pierce's 
1867  paper  on  'On  a  New  List  of  Categories'  in  which 
Pierce  asserted  three  kinds  of  representations  ~  (a) 
likenesses  (icons)  whose  relation  to  their  objects  is  a 
mere  community  in  some  quality'  (b.)  indices  'whose 
relation  to  their  objects  consist  in  a  correspondence  in 
fact  and  (c)  symbols  those  the  ground  of  whose 
relation  to  their  objects  is  an  imputed  quality. 
Subsequently,  the  category  of  icon  was  partitioned 
into  three  subclasses,  images,  diagrams  and  metaphors 
by  Pierce.) 

O  0  systems  perspective:  Insofar  as  one  has  a  likeness 
(icon,  mental  image)  in  mind  for  each  degree  of 
organization,  one  seeks  a  topological  representation 
which  is  consistent  among  all  degrees  of 
organization,  not  just  an  immediate  one.  0° 
complexity  is  grounded  on  observations  of  many 
alternative  iconic  representations  for  a  single  closure 
as  illustrated  by  correspondences  between  the  degrees 
of  organization. 

6.1.4.  INDEX: 

Semiotic  perspective:  "A  sign  is  said  to  be  indexic 
insofar  as  its  signifier  is  contiguous  with  its  signified, 
or  is  a  sample  of  it.  The  term  contiguous  is  not  to  be 
interpreted  literally  in  this  definition  as  necessarily 
meaning  'adjoining  or  adjacent';  thus  "Polaris  may  be 
considered  an  index  of  the  celestial  pole..." 


0°  systems  perspective:  Organization  hierarchy 
require  two  radically  distinctive  types  of  indices  - 
vertical  and  horizontal.  Vertical  indices  represent  the 
composition  of  new  degrees  of  organizations  with 
emergent  properties.  Each  degree  of  vertical 
organization  is  described  with  a  novel  logical  language 
of  dynamic  description.  Horizontal  indices  may  simply 
denote  an  ordering  relationship  on  an  arbitrary  list 
within  one  language  of  description.  Horizontal 
indices  can  create  ordering  relationships  along  a  causal 
pathway. 

6.1.5.  SYMBOLS 

Semiotic  perspective:  "A  sign  without  either  similar- 
ity or  contiguity,  but  only  with  a  conventional  link 
between  its  signifier  and  its  denotata  and  with  an  in- 
tentional class  for  its  designatum,  is  called  a  symbol." 

0°  systems  perspective:  Scientific  usage  of  a  symbol 
requires  observations.  If  one  views  science  as  a 
unified  body  of  knowledge,  then  a  specification  of 
meaning  is  within  organized  natural  structures. 

6.1.6.  NAME 

Semiotic  perspective  :  "A  sign  which  has  an 
extensional  class  for  its  designatum  is  called  a  name. 
An  extensional  definition  of  a  class  is  one  given  'by 
listing  the  names  of  the  members  or  by  pointing  to 
every  member  successively."  (see  Reichenbach,  1948). 

0°  systems  perspective:  A  name  within  a  given 
degree  of  organization  constrains  the  material 
composition  of  the  object  to  specific  patterns  of 
relationships.  For  a  specific  living  system,  a 
biological  name  intrinsically  distinguishes  a  unique 
historical  (genetic)  path. 

6.2  Role  of  Supra-ordination  in  scientific 
semiotic  and  organizational  theory 

Distinctions  between  scientific  notation  and  general 
semiotics  go  beyond  the  meanings  of  definitions.  Sebeok 
writes  of  applying  the  "Law  of  Inverse  Variation,"  "The 
terms,  sign,  symbol,  emblem,  and  insignia  are  here 
arranged  in  the  order  of  their  subordination,  each  term  to 
the  left  being  a  genus  of  its  subclass  to  the  right  and  each 
term  to  the  right  being  a  species  of  its  genus  to  the  left. 
Thus,  the  denotation  of  these  terms  decreases;  for  example, 
the  extension  of  'symbol'  includes  the  extension  of 
emblem  but  not  conversely." 

The  example  used  to  illustrate  the  Law  of  Inverse  Variation 
consists  of  a  series  of  four  terms:  sign,  symbol,  emblem 
and  insignia.  Set  theoretic  language  is  used  to  indicate  that 
the  denotation  of  each  term  decreases  in  a  subordination  of 
semantic  content  —  or,  in  mathematical  terms,  a  nested 
sequence  of  subsets  of  denotata  exists  among  the  four 
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terms.  If  viewed  as  a  discrete  sequence  of  natural  numbers, 
the  Law  of  Inverse  Variation  implies  the  extension  of  a 
term  may  become  vanishingly  small. 

Supra-ordination,  rather  than  sub-ordination,  is  intrinsic  to 
the  semiotics  of  cellular  organization  presented  here. 
Supra-ordination  follows  a  historical,  evolutionary 
perspective  of  the  emergence  of  complexity  —  a  path  of 
creativity  from  the  simple  to  the  more  complex  [9,  12,  14, 
3,  4].  Supra-ordination  is  composed  by  combining  lower 
symbols  to  form  higher  symbols,  Oo1,  O0^,  O0^,...,  in  a 
sequence  of  natural  numbers.  While  the  law  of 
subordination  of  general  semiotics  deduces  a  decreasing 
domain,  this  scientific  semiotics  generates  an 
exponentially  increasing  domain  within  the  material 
structure  of  natural  degrees  of  organization.  The  number  of 
potential  combinatorial  binding  relationships  is 
mathematically  unending.  No  stopping  rules  for  the 
series  O0*,  O0^,  O0^,...  is  known.  Assuming  the 
historical  trends  continue,  this  implies  that  more  highly 
cooperative  systems  will  emerge  in  the  future.  This  is  not 
novel  idea,  but  it  provides  a  new  line  of  support  for  the 
thesis  [9]. 

This  0°  hypothesis  provides  a  basis  for  scientific  semiotics 
which  is  both  richer  and  poorer  than  linguistically-based 
semiotics.  0°  is  richer  in  the  sense  that  scientific 
semiotics  is  based  on  enumeration  of  material  complexity, 
hence  it  demands  both  algebraic  and  geometrical  patterns 
within  the  symbology  and  semantics  of  objectivity.  It  is 
also  richer  in  the  sense  that  0°  provides  a  basis  for  the 
transdisciplinary  communities  to  communicate  within  a 
common  symbolic  framework.  By  the  same  token,  this 
scientific  semiotics  is  poorer  than  general  semiotics  when 
attempting  to  aggregate  human  desires,  values,  feelings, 
longings,  ambitions  or  objectives.  Spiritual  traditions, 
literature,  the  performing  arts,  and  the  sciences  continue  to 
create  more  higher  organized  learned  structures,  ever  more 
remote  from  the  autonomy  of  Oo1  particles.  Natural 
history  is  a  history  of  generating  options  and  generating 
selections  of  ever  greater  complexity. 

Discussions  with  B.  Chandler,  A.  Ehresmann,  K.  Harbaugh,  J. 
Long,  R.  Khuri,  A.  Vogt,  and  B.  Weems  have  added  to  the 
pleasure  of  this  composition. 
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Abstract.  This  paper  presents  the  design 
and  implementation  of  a  holonic 
approach  to  intelligent  control  of  a 
flexible  transfer  system  which  is 
regarded  as  a  multi-agent  system.  After  a 
brief  introduction  in  the  concepts  of 
holonic  manufacturing  the  design  of  a 
multi-agent  architecture  for  the  flexible 
transfer  system  is  presented.  Then  a 
behavioral  control  strategy  supporting 
the  triplet  Planning-Execution- 

Monitoring  is  presented  together  with  its 
implementation  for  the  control  of  the 
work-pieces   transfer  on   the   optimal  path. 

Keywords.  Multi-agent  architecture, 
holonic   control,   flexible   transfer  system. 


1.  INTRODUCTION  -  HOLARCHIES  OF 
HOLONS 

In  the  past  few  years  the  application  of 
the  socio-economic  principles  developed 
by  Artur  Koestler  in  his  book  "The  Ghost 
in  the  Machine"  has  revolutionized  the 
organizational/management  principles 
of  the  manufacturing  industry  world- 
wide^]. Briefly,  holonic  manufacturing 
systems  can  be  regarded  as  a  unified  way 
to  approach  the  hierarchical  control  of 
any  manufacturing  unit,  from  the 
production  process  to  the  whole 
enterprise  level.  Broadly  speaking  holons 
have  most  of  the  generic  properties  of 
agents,  being  mainly  distinguished 
through  their  dual  "  whole-in-the-part" 
nesting  properties  of  super  ordination  to 
the  part(s)  which  is  (are)  contained  b  y 
them    and    subordination     to   the  bigger 


whole  to  which  they  belong.  Families  of 
"wholes  within  wholes"  (named 
holarchies)  replicate  the  same  properties 
starting  from  an  overall  highest  level 
which  includes  several  "sub-wholes"  with 
similar  properties,  these  in  turn 
including  other  "sub-sub-wholes",  etc. 
until  a  last  level  contains  atomic  parts 
which  don't  contain  any  other  "sub- 
wholes"  and  which  usually  have  different 
properties  than  the  (sub-)wholes  Fig.  1. 
As  an  example  one  can  consider  a 
federation  of  states  (e.g.  USA  or  Canada) 
within  a  country.  The  country  as  a 
holarchy  of  states,  has  its  own  political 
structure.  Each  state  in  turn  replicates  a  n 
own  political  structure.  Within  each  state 
there  are  districts  with  there  own 
political  organizations.  These  are  the 
atomic  elements  which  don't  contain  any 
other  holons. 

2.  THE  FLEXIBLE  TRANSFER  SYSTEM 

In  this  paper  a  holonic  approach  to  the 
implementation  of  the  control  strategy 
for  a  flexible  transfer  system  is 
presented.  The  system,  Fig.  2,  consists  of  a 
set  of  pallets  (organized  in  a  grid  which 
can  have  different  shapes)  which 
transfer  a  certain  number  of  work  pieces 
simultaneously.  Each  pallet  is  regarded  as 
an  agent,  which  is  a  sub-holon  in  the 
holarchy  represented  by  the  whole 
system  of  pallets  constituting  the  transfer 
system. 
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Fig.    1:   Holarchy   as   a   system   of  nested   holons   [from  9] 


Each  pallet  (regarded  as  an  agent/sub- 
holon)  has  the  ability  to  act  autonomously 
via  a  computational  system  which 
includes  ability  to  process  sensor 
information  and  data  carriage,  which 
allows  as  well  communication  with  the 
other  pallets  (agents/sub-holons)  via  a 
communication  module,  Fig.  3.  The  spatial 
configuration  of  the  holarchy  can  be 
changed  while  preserving  the  overall 
control  strategy  which  allows  each  holon 
to  move  the  work-piece  in  any  direction 
via  an  OMNI-wheels  mechanism.  The 
overall  goal  is  to  minimize  the  time  of 
transfer  of  all  work  pieces  from  their 
initial  place  to  a  desired  destination  while 
minimizing  the  conflictual  situations 
(like  e.g.  when  two  work  pieces  should 
cross  at  the  same  time  a  pallet  on  their 
minimal    path, (etc.) 

3.  THE  PROPOSED  ARCHITECTURE  FOR 
THE  FLEXIBLE  TRANSFER 


SYSTEM  REGARDED  AS  A  MULTI- 
AGENT  SYSTEM 

The  representation  of  a  pallet  presented 
in  Fig.  3  can  be  regarded  as  an 
intelligent  agent,  as  the  term  is 
generically  understood  in  the  AI 
community  being  an  entity  capable  of 
understanding  its  environment  and  able 
to  independently  deliberate  and  reason 
how  to  use  own  resources  to  reach  a 
desired  goal.  The  agent  has  the  ability  to 
move  in  a  certain  manner  and  can 
manipulate  objects  in  its  environment. 
The  main  concern  from  an  architectural 
perspective  is  to  find  the  most  adequate 
way  for  agent  interaction  and 
cooperation  such  that  agents  can  achieve 
their  goals  optimally  (in  our  case  the 
fastest  transfer  of  the  work  pieces  from 
their  initial  positions  to  the  desired 
destinations.) 
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As  presented  in  Fig.  4,  the  proposed 
architecture  for  the  flexible  transfer 
system     consists     of    groups     (sets)  of 


specialist  agents.  Each  group  is  able  to 
perform  a  certain  kind  of  specialized  task. 
The   interactions   among   specialist  agents 


Fig.    2:    The    Flexible    Transfer  System 

belonging  to  the  same  group  are 
coordinated  by  a  supervisor  of  the  group. 
The  supervisor's  goal  is  to  provide  its 
agents  with  necessary  information  which 
would  allow  them  to  achieve  their  own 
goals.  A  group  of  agents  can  be  regarded 
as  a  blackboard  system  where  specialists 
represent    knowledge    sources  regarding 


execution  of  a  certain  kind  of  task  and  the 
supervisor  plays  the  role  of  both 
blackboard  and  control  mechanism  of  the 
system.  Since  the  architecture  contains 
many  supervisors,  the  proposed  multi- 
agent-system  model  can  be  viewed  as  a 
multi-blackboard  architecture. 
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Fig.    3:    Generic    structure    of    a    pallet  (agent/sub-holon) 
in  the  flexible  transfer  system  holarchy  ) 


The  two  main  classes  of  agents: 
supervisor  and  specialist  will  be 
instantiated  for  the  flexible  transfer 
system  as  follows.  Any  "leader"  pallet  (i.e. 
any  pallet  that  carries  a  workpiece)  is  a 
supervisor  for  the  whole  time  it  carries 
the  workpiece  or  transports  it  towards  its 
next  carrier-pallet  (leader).  All  the 
potential  next  carriers  (potential  next 
"leaders")  are  specialist  agents  belonging 


to  the  same  group,  for  which  the  group - 
supervisor  is  the  pallet  that  carries  the 
work-piece  at  that  moment  in  time.  There 
are  as  many  groups  of  specialist  agents  as 
many  work  pieces  are  transported  at 
once.  Each  group  has  as  supervisor  the 
carrier  of  the  work  piece.  Each  group  is 
on-line  dynamically  organized  in  order  to 
form  a  path  to  transport  the  work  piece  to 
its   final   destination.   Group  supervisors 
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Fig.    4:    Multi-agent    architectural    model    of   the  FTS 


cooperate  in  order  to  solve  conflicts  i  n 
case  two  paths  cross  each-other  o  r 
overlap  on  a  certain  portion.  In  extreme 
situations  decisions  are  made  through 
negotiation  as  of  which  piece  should  pass 
first  through  the  same  cell  (pallet). 

Each  specialist  agent  on  a  path  will 
become  a  supervisor  when  the  work  piece 
reaches  it.  So,  the  configuration  of  the 
architecture  is  dynamically  changed.  The 
next  carrier  becomes  from  specialist 
agent  a  supervisor  and  in  turn  the  last 
supervisor  becomes  a  specialist  agent 
after  the  work  piece  has  been  transported 
on  the  next  carrier.  Each  group  defines  a 
class  of  agents  consisting  of  individual 
objects  which  are  specialist  agents. 
Interactions  between  groups  are 
managed  by  the  group  supervisors  at 
each  step.  The  goal  of  each  group  is  to 
transport  the  piece  carried  by  the 
supervisor  as  soon  as  possible  to  the 
desired  destination.  Supervisors  are  able 
to  do   optimal    path    planning,  conflict 


resolution,  cooperation,  negotiation  and 
other   intelligent  tasks. 

Given  that  the  upper  limit  in  the  number 
of  supervisors  equals  the  maximal 
number  of  pieces  that  can  be  transported 
at  once,  this  number  has  to  be  determined 
together  with  the  analysis  of  the 
feasibility  of  the  proposed  control 
solution.  As  well  an  upper  limit  of  the 
number  of  workpieces  to  be  transported 
at  once  by  the  flexible  transfer  system 
without  need  of  a  global  supervisor  has  to 
be  determined.  Once  this  threshold  is 
exceeded  the  control  architecture  has  to 
be  enhanced  with  a  higher  level  of 
overall  control  implemented  via  a  global 
supervisor.  The  role  of  the  global 
supervisor  would  be  to 

manage/coordinate  the  interactions 
between  group  supervisors,  solve 
conflicts  between  them  and  help  their 
negotiation  strategies  in  making  adequate 
decisions  by  setting  priorities  according 
to   a-priori  established  rules. 
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Within  a  group,  specialist  agents  have 
two  types  of  communication:  horizontal 
(characteristic  to  the  autonomous 
architectural  paradigm)  and  vertical 
(characteristic  to  the  blackboard 
architectures . ) 

Horizontal  communication  aims  to 
support  interactions  between  specialists 
belonging  to  the  same  group. 

Vertical  communication  indicates  how 
specialists  interact  with  their  group 
supervisor. 

Specialist  agents  are  endowed  with 
capabilities  for  perception  (sensors) 
reasoning  and  planning  (i.e.,  they  are 
cognitive).  They  must  have  the  capability 
to  adapt  themselves  to  unexpected 
situations.  This  is  achieved  through 
adequate  adaptive  and  dynamic  control 
strategies. 


4.  HOLONIC  CONTROL  WITH 
PLANNING-EXECUTION-MONITORING 

The  proposed  control  strategy  is  based  o  n 
the  triplet  of  Planning,  Executing  and 
Monitoring  (in  short  PEM.)  In  designing 
intelligent  multi-agent  control  and 
especially  in  building  a  logical  model,  all 
activities  (manipulation,  motion  and 
sensing)  are  implemented  with 
corresponding  planning,  executing  and 
monitoring  elements.  Each  activity  is 
expressed  as  an  instance  of  the  PEM 
triplet. 

The  connections  between  the  generic 
activities  Planning,  Execution  and 
Monitoring  are  made  by  means  of  control, 
data  flow  and  related  models.  There  are 
different  choices  to  carry  out  the 
activation  and  deactivation  of  planning, 
executing  and  monitoring.  Ability  to  cope 
with  re-planning  in  dealing  with 
unexpected  situations  is  essential.  It  is  the 
responsibility  of  executing  and 
monitoring  to  decide  whether  the  current 
plan  is  considered  valid  or  not.  If  not,  re- 
planning     will    be    activated.     The  actual 


decomposition  takes  place  in  the 
execution  phase,  while  the  planned 
operations  are  activated.  The  execution 
mechanism  takes  care  of  activation  and 
deactivation  of  planned  operations  as 
described  in  the  plan.  Activation  of  a 
decomposable  operation  means  activation 
of  the  meta-control  at  the  immediate 
lower  level. 

The  atomic  operations  pass  control 
commands  and  sensing  requests  to 
actuators  and  sensors  through  device- 
drivers.  They  process  and  pass 
information  for  the  use  of  planners  and 
executors  to  the  local  models,  either  to 
their  own  level  or  upper  levels  of  the 
control  hierarchy.  Atomic  operations  also 
update  the  global  models  and  therefore 
are  used  to  maintain  model  consistency. 
The  way  the  system  reacts  to  erroneous  o  r 
unexpected  situations  (not  covered  with 
the  possible  conditional  structures  in  the 
plans)  depends  not  only  of  the  capability 
of  the  planners  to  find  recovering  plans 
but  also  on  the  timing  logic  of  the  meta- 
control.  Reactive  behavior  can  be 
achieved  only  if  replanning  and  the 
subsequent  execution  are  done  as  quickly 
as  possible. 

The  proposed  control  strategy  is  inspired 
by  the  synchronous  planning  principle 
of  Albus  [2].  In  his  model,  Albus  assumes 
instant  planning  done  synchronously  i  n 
a  frequency  typical  to  the  level  of 
hierarchy.  It  means  that  the  ordinary 
planning  is  signaled  to  start  by  a  clock 
mechanism,  while  replanning  is  signaled 
a-synchronously  from  erroneous 

situations.  The  distinctive  aspect  of  our 
strategy  is  to  decompose  the  execution 
mechanism  into  lower  level  planning  and 
execution  mechanisms  (nested.)  Planners 
are  considered  to  do  all  the  planning 
related  activities,  that  is  also  the 
computation  based  sensory  information 
during  the  execution.  After  a  partial  plan 
has  been  created,  the  execution  starts. 
The  planning  is  activated  again  when 
acquired  sensory  information  has  to  b  e 
transformed  into  plan  parameters.  There 
are  several  ways  in  which  the  executor 
and  the  planner  can  be  synchronized  (i.e. 
connected    to  the  data  in  the  local  model 
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either  with  a  direct  control  signal  from 
the  executor  to  the  planning  function  of 
the  planner  -  in  case  the  goal  exists  -or 
by  passing  the  control  signal  via  the 
meta-control  -  if  a  goal  does  not  exist.) 


8.  IMPLEMENTATION  OF  THE  CONTROL 
STRATEGY 

Considering  an  object-oriented  mail-box 
based  communication  approach,  an  agent 
is  regarded  as  a  concurrent  object 
blocked  in  waiting  the  next  message  to 
execute.  Each  agent  is  characterized  by  its 
mailbox  which  is  managed  by  a  thread. 
This  allows  an  asynchronous 
communication  between  agents  which 
are  physically  distributed  through  a  set 
of  stations. 

We  define  five  classes  of  objects:  object, 
class,  meta-super,  specialist  and 
supervisor ,  The  basic  executing  model 
contains  following  classes:  object,  class 
and  meta-super.  The  class  object  has  two 
kind  of  actions:  ask  -  that  enable  object  to 
get  a  value  from  an  attribute  and  give  - 
that  enable  object  to  affect  a  value  to  a  n 
attribute.  In  the  class  class  the  action 
new  allows  to  create  a  new  class.  In  the 
class  meta-super,  the  action  create-super 
instantiates  supervisor  objects.  The  class 
meta-super  allows  to  generate  as  many 
control  levels  as  demanded  by  the 
complexity  of  the  system.  The  action 
broadcast  allows  a  supervisor  to 
communicate  with  its   group's  agents. 

A  group  supervisor  has  a  limited  view  of 
the  environment.  All  group  supervisors 
whose  physical  environment  should  be 
crossed  by  a  piece  must  cooperate  i  n 
order  to  lead  this  piece  to  the  desired 
point.  Each  supervisor  has  a  goal  to  move 
the  piece  to  the  desired  point,  and 
according  to  the  environment  it  manages 
it  provides  a  plan  for  moving.  The  global 
supervisor  determines  what  supervisors 
must  cooperate  and  specify  them  the 
trajectory  the  piece  should  pursue.  By 
knowing  the  initial  position  of  the  piece 
as  well  as  the  final  position  the  piece 
should  reach,  the  global  supervisor    has  a 


function  sub-cooperate  which  takes 
coordinates  of  these  points,  its  belief  base 
(bell-base),  and  returns  a  list  (List-of- 
sup)  of  group  supervisors  which  must 
cooperate  to  satisfy  this  goal. 
Furthermore,  its  belief  base  contains 
coordinates  of  a  set  of  points  situated  o  n 
the  intersection  between  two  physical 
possible  paths.  These  points  represent 
"points"  of  cooperation  between  group 
supervisors  coordinating  the  specialists 
groups  on  each  of  the  intersecting  path. 

Consider  for  example  S  being  a  global 
supervisor  and  SI,  S2  group  supervisors. 
To  plan  the  move  of  a  piece  from  position 
A  to  position  B  the  global  supervisor 
should  perform  a  sequence  which  can  be 
encoded  as  follows  (given  our  above 
considerations): 

sup-cooperate  (bell-base,  A, B, Li  st-of-sup); 
any(Si,Sj)  in  List-of-sup  X  List-of-sup 

inform  (Si,  M(x,y)); 

inform  (Sj,  M(x,y)); 
broadcast    (List-of-sup,  plan-path); 

In  the  above  sequence,  M  is  an 
intersection  point  between  two  possible 
paths  controlled  by  Si  and  Sj,  and  (x,y) 
are  its  coordinates.  As  a  result  of  message 
execution  plan-path,  each  group 
supervisor  plans  a  sequence  of  actions 
for  enabling  a  robot  to  move  in  its 
physical  environment. 

For  the  supervisor  SI  the  sequence  of 
operations   would  be: 

move-straight   (30  pallets); 
turn-left   (90  degrees); 
move-straight    (30  pallets); 

For  the  supervisor  S2: 

move-straight   (25  pallets); 
turn-right    (90  degrees); 
move-straight   (20  pallets); 
turn-left    (90  degrees); 
move-straight   (20  pallets); 
turn-right    (90  degrees); 
move-straight   (15  pallets); 
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Once  the  piece  has  reached  the  desired 
position,  it  needs  to  be  integrated  within 
the  new  group.  The  supervisor  S2  must 
initialize  the  new  transfer  by  specifying 
the  belief  rule  base  to  load.  The  goal  is  to 
enable  the  transfer  of  the  piece  in  the 
new  environment.  In  this  way  the 
supervisor  represents  a  cooperation 
framework  which  enables  him  to  adopt  a 
plan  of  actions  by  selecting  (meta- 
)capabilities  at  the  group's  agents. 

9.  CONCLUSIONS 

In  summary,  the  proposed  architecture  is 
adjustable  to  applications  with  different 
levels  of  complexity.  The  control  of  the 
complexity  is  managed  via  the  number  of 
hierarchical  levels  chosen  for  the 
particular  instantiation.  In  its  simplest 
instantiation  the  architecture  consists  of 
a  single  group  having  one  supervisor, 
which  represents  a  blackboard 
configuration.  There  is  practically  n  o 
upper  limit  for  the  complexity  of  the 
system  to  be  represented,  which  can  be 
encapsulated  by  increasing  the  number 
of  hierarchical  levels  in  the  architecture. 
This  increases  the  flexibility  and 
efficiency  of  the  control  strategy 
responsible  for         the  adequate 

functionality  of  the  proposed  multi-agent 
architecture.  The  hierarchical  control 
levels  may  be  exploited  as  meta-levels  for 
learning  in  a  multi-agent  environment. 
An  agent  has  its  own  capability  of 
learning,  but  if  it  encounters  a  non 
deterministic  state  it  must  interact  with 
its  supervisor  to  decide  what  actions  it 
should  choose  to  pass  in  a  deterministic 
state. 
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ABSTRACT 

Multi-agent  systems  provide  responsive  and 
reconfigurable  information-processing  structures  that  can 
facilitate  flexible,  efficient  use  of  manufacturing  resources  in  a 
rapidly  changing  environment.  For  maximum  scalability  and 
fault  tolerance,  agents  relevant  for  a  particular  task  may  be 
dynamically  gathered  into  communities  where  they  are 
coordinated  and  arbitrated  by  mediator  agents.  In  a  dynamic 
manufacturing  environment,  the  collective  behavior  of  such  a 
system  is  an  emergent  property;  hence,  robust  learning 
mechanisms  are  required  for  effective  mediation.  A  learning 
mechanism  that  enhances  the  capabilities  of  a  multi-agent 
concurrent  design  and  manufacturing  system  is  presented,  and 
associated  implementation  issues  are  discussed. 
Keywords:  Multi-Agent  System;  Manufacturing;  Emergent 
Behavior;  Learning;  Coordination;  Mediator. 

1.0  INTRODUCTION 

Heterogeneous  multi-agent  systems,  comprising  a 
network  of  dissimilar  autonomous  interacting  software  entities 
called  intelligent  agents  (Wooldridge  and  Jennings,  1995),  are 
becoming  increasingly  popular  for  implementing  distributed, 
cooperative  manufacturing  planning  and  control  systems. 
Such  systems  provide  responsive  and  reconfigurable 
information-processing  structures  that  facilitate  flexible, 
efficient  use  of  manufacturing  resources  in  a  rapidly  changing 
environment. 

Intelligent  agents  are  knowledgeable  in  their  local 
domains  and  share  the  responsibility  of  achieving  multi- 
objective  system  goals  through  concurrent  negotiation.  The 
relevant  agents  for  a  particular  task  may  be  gathered  into  a 
special  community  that  is  dynamically  formed,  coordinated 
and  arbitrated  by  mediator  agents  (Wiederhold,  1992)  for 
maximum  scalability  and  fault  tolerance.  In  this  architecture, 
the  responsibilities  of  a  traditional  centralized  supervisory 
manager  are  distributed  among  the  mediator  agents  (Duffie, 
1990). 

These  mediators  are  also  coordinators,  possessing 
organizational  knowledge  and  meta-level  rules  necessary  to 
facilitate  cooperation  among  intra-  and  interagent 
communities.  In  a  dynamic  manufacturing  environment,  their 


knowledge  cannot  be  static,  since  the  system's  behavior 
depends  on  problem  solving.  Thus,  the  system's  collective 
behavior  is  an  emergent  property,  requiring  robust  learning 
mechanisms  for  effective  mediation. 

One  possible  approach  is  learning  from  history.  For 
example,  in  manufacturing  planning  and  scheduling  many 
repetitive  activities  can  be  learned  for  reuse.  For  this,  the 
intelligent  system  needs  encoding  models  and  specialized 
reasoning  agents  to  decide  upon  the  use  or  rejection  of  learned 
transactions.  The  learning  ability  facilitates  improvement  in 
the  performance  of  both  individual  agents  and  the  multi-agent 
system  overall. 

In  addition  to  learning,  a  sophisticated  mechanism  for 
propagating  the  state  of  agent  interactions  into  the  future  is 
required.  This  attribute  provides  an  intelligent  forecasting 
medium  through  which  multi-agent  system  responses  can  be 
adjusted  and  refined. 

2.0  RELATED  WORK 

In  recent  years,  researchers  have  explored  various 
approaches  for  learning  and  emergent  behavior  in  multi-agent 
systems.  The  notion  of  "behavior"  has  become  recognized  as  a 
fundamental  building  block  in  the  artificial  intelligence, 
control,  and  learning  research  community  (Mataric,  1995). 
Behavior  is  seen  as  a  regulator  in  interaction  dynamics 
between  agents  and  the  environment.  This  phenomenon  has 
also  been  observed  by  Steels  (1990);  Maes  (1990);  Brooks 
(1991). 

The  Touring  Machine  agent  design  architecture 
(Ferguson,  1992)  has  been  used  to  control  multi-agent 
activities  in  dynamic  environments.  The  Heterogeneous 
Reasoning  and  Mediator  System  (HERMES)  (Subramanian, 
1996)  is  an  architecture  designed  to  integrate  heterogeneous 
sources  of  data  and  reasoning  paradigms  through  the  creation 
of  specialized  mediators. 

Distributed  Refinement  Among  Multiple  Agents 
(DRAMA)  was  developed  as  a  machine-learning  technique 
(Byrne  and  Edwards,  1995)  to  improve  agent  knowledge.  This 
technique  provides  agents  with  the  ability  to  learn  their 
organizational  roles  in  a  distributed  cooperative  search.  In 
addition,  the  agents  are  also  capable  of  learning  meta-level 
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interactions  in  the  search  space.  Dynamic  reinforcement 
learning  algorithms  (Q-learning)  (Sandholm  and  Crites,  1995) 
have  been  used  in  multi-agent  systems  to  help  individual 
agents  learn  about  the  environment  and  from  other  agents. 
Learning  organizational  roles  in  heterogeneous  multi-agent 
systems  has  been  addressed  in  the  L-TEAM  (Prasad  et  al., 

1995)  framework. 

Exploiting  past  experience  from  intelligent  agent 
interactions  has  been  studied  in  the  context  of  multi-agent 
Case-Based  Reasoning  (CBR)  (Ketter  and  Hendler,  1994). 
The  negotiated  retrieval  technique  (Prasad  et  al.,  1996) 
retrieves  and  assembles  "case  pieces"  from  multiple  sources  in 
a  corporate  memory  to  create  an  overall  problem  case. 
Federated  peer  learning  (Prasad  and  Plaza,  1996)  uses  two 
models  of  cooperation:  Distributed  Case-Based  Reasoning 
(DisCBR)  and  Collective  Case-Based  Reasoning  (ColCBR). 
In  DisCBR,  an  agent  can  delegate  its  authority  to  another 
agent  to  solve  a  problem.  In  ColCBR,  an  agent  maintains  its 
authority  while  exploiting  the  experience  of  a  peer  agent. 

3.0  EMERGENT  BEHAVIOR  AND  LEARNING 

The  multi-agent-based  manufacturing  system  being  used 
for  the  present  work  comprises  several  design  subsystems  and 
resource  communities  (virtual  factories)  of  shop-floor 
resources.  Each  design  subsystem  provides  feature-based 
capabilities  through  a  part  agent  and  various  feature  agents 
(Balasubramanian  et  al.,  1996).  The  part  agent  is  the 
repository  for  both  product  data  and  relationships  among 
feature  agents.  Each  part  has  a  set  of  features,  which  in  turn 
have  technological  requirements  that  help  structure 
relationships  with  shop-floor  resources.  For  each  feature  agent 
in  the  part,  resource  communities  respond  with  feasible 
manufacturing  plans  and  costs.  Products  are  represented  as 
part  agents,  which  are  containers  for  various  types  of  features. 

The  manufacturing  system  needs  to  be  dynamically 
configured  to  support  changes  in  batch  sizes,  product  mix,  or 
design  plans,  which  precipitate  production  setup  changes. 
Extensive  search  processes  are  required  to  identify  the  multi- 
agent  interactions  for  each  problem  case.  These  interactions 
are  an  emergent  property  that  evolves  concurrently  within 
each  level  in  the  organization  for  different  types  of 
requirements,  constraints,  and  agents  (Maturana  and  Norrie, 

1996)  . 

For  each  new  problem  case,  the  whole  space  of 
interactions  is  investigated  using  a  restricted  search  approach 
similar  to  a  distributed,  local  breadth-first  search.  This  search 
mechanism  achieves  a  global  optimum  but  is  computationally 
expensive  and  involves  non-trivial  overheads,  particularly  for 
large  multi-agent  populations.  The  learning  mechanism 
presented  here  enables  more  efficient  use  the  system's 
behavior,  thereby  reducing  overheads. 

Since  the  manufacturing  system  is  highly  dynamic,  with 
numerous  elements  and  constraints,  it  is  very  difficult 
(generally  impossible)  to  establish  an  optimum  generic 
procedure  for  manufacturing  tasks  using  static  knowledge 


about  manufacturing  resources.  However,  properly  identifying 
dynamic  interactions  among  agents  and  the  emergent  behavior 
of  the  agent  group  as  a  whole  facilitates  a  new-optimal 
solution  for  these  tasks. 

Organization  of  resource  communities  occurs  at  two 
levels:  macro  and  micro.  Macro-level  organization  is  static, 
based  on  knowledge  about  closely  related  agents  that  are 
distinct  from  others  and  can  be  physically  separated.  As 
shown  in  Figure  1,  such  communities  can  be  made  of  part, 
inventory,  machine,  tool,  and  transportation  agents. 


^^Part  Agents 


Figure  1:  Macro-Level  Communities 


Figure  3:  Inter-Community  Interactions 


Within  a  macro-level  community,  heterogeneous  agents 
form  micro-level  communities  whose  composition  is  not  static 
and  therefore  unknown.  Such  micro-level  communities  are 
determined  by  predominant  behavioral  characteristics 
(grouped  in  classes),  see  Figure  2.  These  communities 
typically  contain  primary  or  secondary  process  machine 
agents,  design  features,  or  tools. 

There  are  three  types  of  interaction  among  various  agents 
within  a  system.  The  first  type  lies  entirely  inside  the 
boundaries  of  a  micro-level  community.  As  shown  in  Figure 
2,  interactions  of  the  second  type  involve  two  or  more  micro- 
level  communities  within  a  macro-level  community.  In  the 
third  type,  interactions  span  two  or  more  macro-level 
communities,  as  shown  in  Figure  3.  The  learning  model 
presented  here  considers  all  these  organizational  possibilities. 
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4.0  LEARNING  MODEL 

The  system's  learning  focuses  on  the  transactions 
between  the  design  system  and  the  resource  communities  that 
result  in  promissory  plans.  These  promissory  plans  include 
relationships  among  manufacturing  resources  ordered 
according  to  precedence  constraints  and  processing  times 
needed.  For  a  given  set  of  features,  there  are  appropriate 
combinations  of  resources  (resource  networks)  for  the 
manufacturing  operations.  The  composition  of  the  resource 
network  varies  with  time,  since  availability  and  service  costs 
of  resources  are  dynamic  parameters  affecting  the  decision- 
making processes.  Thus,  the  learning  model  requires  two  main 
classification  functions,  one  feature-based  and  the  other 
resource-based. 

Feature-based  classification  creates  subcategories  of 
features  that  are  progressively  recognized  by  mediator  agents. 
These  subcategories  are  associated  with  specific  groups  of 
resources  and  emergent  behaviors  that  define  the  relationship 
between  the  features  and  the  resources.  Similarly,  the 
resource-based  function  classifies  resources  into  subgroups 
associated  with  specific  families  of  features  and  emergent 
behaviors. 

A  behavioral  pattern  includes  knowledge  about  micro- 
level  communities,  their  constituent  agents,  dynamic 
associations  among  those  agents  for  a  specific  time,  work 
load,  product  type,  and  product  mix.  These  patterns  recognize 
how  agents  organize  themselves  into  virtual  clusters  that 
define  the  knowledge  hierarchies  through  which 
manufacturing  tasks  can  be  processed. 

Initially,  the  multi-agent  manufacturing  system  operates 
in  a  "blind  space"  and  every  resource  in  it  is  involved  in 
decision-making  activities.  The  system,  lacking  knowledge 
regarding  behavioral  patterns  followed  by  agents  during  the 
decision  process,  is  forced  to  carry  out  a  global  search  for 
unknown  patterns.  At  the  beginning,  generalized  broadcasting 
is  used  to  inform  resource  agents  about  the  manufacturing 
tasks  and  their  requirements.  Subsequently,  dynamic 
hierarchies  of  coordination  clusters  are  formed  and  selective 
broadcasting  can  be  used. 

The  behavioral  knowledge  gained  through  this  learning 
process  can  be  used  in  two  ways  to  improve  system 
performance.  When  a  new  task  arises,  a  community  mediator 
has  to  identify  related  macro-  and  micro-level  communities  of 
agents  for  initial  broadcasting  and  for  subsequent  formation  of 
coordination  clusters.  Behavioral  knowledge  facilitates  this 
selection  of  agents  for  coordination  clusters. 

The  second  benefit  relates  to  how  queries  are  propagated 
among  coordination  clusters.  Dynamic  mediators  can  retrieve 
the  performance  history  of  agents  to  rank  the  agents  in  the 
cluster.  Agents  with  higher  rank  are  selected  to  propagate 
sublevel  clusters.  In  this  manner,  dynamic  mediators  replace  a 
naive  search  with  a  distributed  depth-first  search,  by  inhibiting 
the  formation  of  coordination  clusters  by  agents  ranked  below 
threshold  limits.  Only  if  higher  ranked  clusters  fail  will  lower 
ranked  clusters  be  formed. 


This  enhanced  search  process  is  shown  in  Figure  4. 
Ranking  ranges  from  1  to  10  and  there  are  only  two  subranges 
(higher  and  lower),  with  a  threshold  value  of  5.  All  the 
clusters  ranked  above  the  threshold  value  are  allowed  to 
propagate  at  every  level.  The  lower  ranked  ones  are  allowed 
to  continue  at  each  level  only  if  the  higher  ranked  clusters  fail. 
This  amounts  to  distributed  backtracking  without  explicit 
control. 

Since  the  propagation  of  coordination  clusters  defines  the 
interaction  among  agents  from  various  micro-level 
communities  and  the  distributed  search  process,  it  also  defines 
the  potential  domain  of  emergent  behaviors.  The  ranking 
defines  only  the  propagation  mechanism  and  has  no  influence 
on  the  outcome.  Any  combination  of  resource  agents  might 
win  the  bidding  process,  despite  their  ranking. 


Figure  4:  Distributed  Depth-First  Search 

The  ranking  keeps  changing  with  every  outcome, 
necessitating  continuous  learning.  Mediator  agents  possess  the 
requisite  meta-rules  to  accomplish  this.  Since  an  inherent 
characteristic  of  the  multi-agent  system  is  that  multiple  tasks 
may  be  solved  at  any  given  time,  multiple  distributed  search 
processes  can  be  carried  out  in  parallel.  The  distributed  depth- 
first  search  mechanism  helps  this  process  by  reducing  the 
computational  requirements  and  communication  bandwidth. 

Distributed  parallel  learning  is  then  possible  during  the 
construction  of  partial  promissory  plans.  Each  coordination 
cluster  contributes  with  a  set  of  partial  relationships  (among 
the  manufacturing  resources)  to  solve  a  particular  aspect  of  the 
manufacturing  requests.  Partial  relationships  are  associated 
with  both  basic  and  dynamic  costs,  allowing  cluster 
coordinators  to  define  priorities  and  control  for  the 
propagation  of  clusters.  By  aggregating  the  partial 
relationships,  a  set  of  final  promissory  plans  is  established  and 
subsequently  refined  to  satisfy  other  dynamic  constraints  on 
the  shop  floor.  The  final  refinement  provides  the  best  possible 
production  sequence  for  the  product,  including  relationships 
that  link  machine  agents  and  tool  agents  into  an  emergent 
organization  for  it.  Subsequently,  promissory  plans  are  used  to 
define  the  best  possible  schedules  for  the  resources  in  each 
relationship,  yielding  the  best  overall  promissory  plan 
(Maturanaet  al.,  1997). 

In  recent  experiments,  it  has  been  shown  that  the  time  for 
distributed  planning  in  the  system  reduces  by  a  factor  of 
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between  100  and  500  as  learned  relationships  become  used. 

5.0  LEARNING  FROM  HISTORY 

The  emergent  relationships  obtained  during  the  search,  to 
encode  the  solution  plans  for  partial  aspects  of  the 
manufacturability  request,  are  related  to  the  individual 
features  of  the  product's  design.  Depending  upon  the  feature's 
category,  geometric  characteristics,  and  finishing 
requirements,  it  is  now  virtually  linked  to  resources.  For  a 
single  feature  there  may  be  several  relationships,  both 
technological  and  dynamic.  The  virtual  links  vary  dynamically 
according  to  the  status  of  the  resources  on  the  shop  floor. 

Encoding  these  emergent  relationships  for  manufacturing 
is  based  on  distributed  machine-tool  information.  A  machine 
resource  may  require  one  or  several  tools  to  either  partially  or 
thoroughly  meet  the  requirements  for  a  single  feature.  This 
information  is  dynamically  captured  in  coordination  clusters 
and  sent  to  the  community  mediator  for  encoding  and  storing 
in  information  structures.  Each  community  may  implement 
several  of  these  learning  systems  to  encode  different 
behaviors.  After  confronting  the  learning  system  with  a 
previously  recognized  product,  a  set  of  relationships  is 
obtained  and  accumulated.  At  this  stage,  the  mediator  agent 
uses  the  macro-level  static  classification  of  resources  to  create 
a  hierarchical  structure  (i.e.,  primary,  secondary,  tertiary,  etc.) 
that  satisfies  precedence  constraints. 

As  the  system  learns  dynamically  from  interactions, 
various  emergent  relationships  define  alternatives  for  the 
manufacturing  sequence,  each  partitioned  into  an  owner- 
dependent  link  to  establish  a  reduced  group  of  resource 
agents.  An  owner  is  an  entity  linked  to  a  set  of  dependent 
entities.  Conversely,  a  dependent  is  an  entity  owned  by  a 
higher  level  entity.  The  primary-level  owner-dependent  link 
initiates  a  primary-level  coordination  cluster,  which  is  then 
provided  with  the  other  owner-dependent  links  for  forming 
other  sublevel  clusters  of  either  machine  or  tool  agents.  Each 
sublevel  cluster  is  also  provided  with  the  link  information 
associated  with  its  searching  domain.  At  this  point,  the 
mediator  agent  can  structure  a  feasible  search  region  for  a 
request  and  broadcast  to  selected  agents  to  generate 
promissory  plans  and  costs. 

6.0  LEARNING  FROM  THE  FUTURE  THROUGH 
FORECASTING 

Forecasting  the  system's  behavior  consists  of  capturing 
the  state  of  the  multi-agent  system  (plans)  and  simulating  it 
for  a  period  of  time  into  the  future.  The  forecasting  process 
simulates  the  behavior  of  the  virtual  model,  which  emulates 
the  shop-floor  activities.  Since  the  agents'  commitments  are 
generated  at  a  promissory  level  and  based  on  distributed 
estimations  of  the  system's  real  state,  this  process  is  a  high- 
level  simulation  activity  that  propagates  this  behavior  into  its 
future  environment.  Machine  mediators  monitor  the  system's 
status  by  conducting  global  performance  evaluations.  Various 
forecasting  triggering  parameters  are  used  based  on  system 


load,  extension  of  the  scheduling  horizon,  and  adjustment 
periodicity.  These  parameters  are  quantified  and  combined 
according  to  the  following  criteria: 

1.  System  load:  This  parameter  is  set  to  promote  a 
balanced  work  load  among  the  resources.  The  number  of  new 
jobs  allocated  is  measured  to  calculate  a  load  ratio  (R,.)  which 
is  then  checked  against  a  load  threshold.  R,  greater  than  this 
load  threshold  satisfies  the  first  condition  for  triggering  the 
simulation  forecasting  process.  This  condition  is 
complemented  by  the  adjustment  periodicity  parameter  as 
shown  in  Equation  1 : 

Number  of  new  jobs  allocated 
>  1     Total  number  of  jobs  allocated 

IfR   >  Threshold,  0) 

Then 

If  (Periodicity) 

Then  Run  Forecasting 
Else  Continue 

By  considering  the  natural  adjustment  of  schedules  after 
forecast  scheduled  times,  resources  may  open  availability  slots 
at  previously  occupied  positions. 

2.  Extension  of  the  scheduling  horizon  (R.):  This 
parameter  changes  the  configuration  of  schedules  to  vary  job 
priorities.  Since  schedule  adjustments  either  extend  or  contract 
the  scheduling  horizon,  lateness  and  earliness  penalizations 
affect  the  scheduling,  differently.  In  order  to  extend  the 
scheduling  horizon,  new  job  allocations  are  more  heavily 
penalized  if  the  jobs  approximate  the  due-date  boundaries.  To 
contract  the  scheduling  horizon,  resources  that  contract  their 
scheduling  charts  may  allocate  more  jobs  at  a  low  penalization 
rate.  This  parameter  is  shown  in  Equation  2. 

(Completion  Time) 

_  'max 

s  ~       (Due  Date) 

v  'max 

If  R    >  Threshold  (2) 

s  s 

Then 

If  (Periodicity) 

Then  Run  Forecasting 
Else  Continue 

3.  Periodicity  (Rp):  This  parameter  regulates  the  time 

delay  between   forecasting  events.   It  may  trigger  the 

forecasting  event  either  alone  or  in  combination  with  one  of 

the  two  parameters  above.  When  Rp  acts  alone,  the  number  of 

new  allocations  must  be  greater  than  zero.  A  time  periodicity 

threshold  is  also  provided  to  quantify  the  periodicity,  as 

shown  in  Equation  3. 

IfR  >  Threshold 
p  p 

Then 

If  (number  of  new  allocations)  >  0  (3) 
Then  Run  Forecasting 
Else  Continue 

If(Rpand  R|)or(Rpand  R$) 
Then  Run  Forecasting 
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The  system's  performance  is  measured  through  a 
composite  function  that  measures  average  flow  time, 
maximum  completion  time,  and  number  of  late  jobs,  as  shown 
in  Equation  4. 

/>  =  wr*F  +  wc*C_+wj*Z,y 

where: 

w;  -average  flow  time  weight  (4) 
wc-  maximum  completion  time  weight 
Wy-late  jobs  weight 

F. -average  flow  time 

C  ^.-maximum  completion  time 

L,. -number  of  late  jobs 

J \  -number  of  jobs  completed 
Each  performance  value   is  weighted  according  to 
management  policies.  The  composite  function  is  not  restricted 
to  these  performance  values.  The  performance  ratio  (R^ics)  is 
calculated  as  follows: 

R       -  — 

metrics  p 

(5) 

P'  -forecasting  simulation  performance 
P.-system  performance  before  adjustment 

Lower  and  upper  thresholds  (—a,  +  o)  are  used  to 
define  acceptance  and  rejection  performance-ratio  regions. 
The  system's  state  of  commitments  remains  unaltered  if  ratios 
fall  within  the  thresholds. 

The  forecasting  simulation  period  is  established  by  the 
system  manager  according  to  high-level  policies.  However, 
the  determination  of  the  simulation-control  parameter  may  be 
customized  to  automatically  adapt  to  the  system's 
requirements.  Two  main  approaches  can  be  used  to  establish 
the  forecasting  period: 

1 .  The  forecasting  simulation  may  be  run  for  a  period  of 
time  sufficiently  large  to  permit  the  completion  of  every 
committed  job.  But  for  large  simulation  periods,  this  criterion 
incurs  accumulative  deviations,  since  several  jobs  may  arrive 
in  the  system  that  affect  intermediate  allocation  slots.  Because 
these  slots  are  not  preempted  during  simulation,  this  approach 
only  produces  rough  estimations. 

2.  The  second  approach  attempts  to  avoid  the 
accumulation  of  deviate  behaviors,  thereby  extending  the 
forecasting  simulation  for  short  periods  of  time  only.  This 
approach  is  more  accurate,  since  the  duration  of  the  simulation 
is  much  less  than  the  arrival  frequency  of  jobs.  Here,  the 
multi-agent  system  can  easily  be  adjusted  while  including  the 
intermediate  requirements. 

Either  approach  produces  an  estimated  performance 
value,  which  is  used  to  decide  upon  the  refinement  of  the 
system's  commitments.  Figure  5-A,  illustrates  the  variation  in 
system  performance.  Here,  the  system  performance  for  two 
consecutive  simulation  periods  is  represented  by  f,  and  f2  and 
the  variation  in  performance  for  the  forecast  model  is 
represented  by  f ,  and  f  2.  It  is  assumed  that  the  simulation- 
triggering  parameters  are  satisfied  at  period  1 .  At  the  end  of 
the  simulation  both  performances  ~  i.e.,  f,  and  f ,  -  are 


compared  through  a  performance  ratio  function.  In  this  model, 
the  system  behavior  is  very  close  to  the  simulated  behavior; 
therefore,  promissory  planning  is  being  developed  within  an 
acceptable  range.  Thus,  the  system  continues  to  plan  using  the 
current  state  of  commitments. 

Fig.  5-B  shows  a  second  forecasting  simulation  occurring 
at  period  2.  Since  the  performance  ratio  between  f2  and  f  2  is 
sufficiently  large  to  exceed  the  performance  threshold  the 
system  decides  to  adopt  the  forecast  scheduled  times, 
replacing  current  schedule  times  by  these.  The  state  of 
commitments  is  modified  at  each  planning  level  affected  by 
these  changes.  For  the  next  period  (period  3)  of  the 
forecasting  simulation,  the  performances  were  sufficiently 
close,  so  the  current  state  of  commitments  was  retained. 

Rrfcrrrance  ftrfcrrrBrce 


SirriMcn  Triggering  Sirrulaticn  Triggering 


Figure  5:  System  Performance  Variation 

6.1  Experimental  Results  for  Learning  from  Forecasting 

A  product  mix  comprising  100  of  each  of  three  part  types, 
namely,  Bearing  cover,  EMI  housing,  and  Guide,  were 
"manufactured"  in  two  shop-floor  areas  using  an  AGV 
transportation  system  to  handle  raw  materials  and  semi- 
finished products.  Shop  floor  #1  had  3  vertical  machining 
centers,  2  internal  grinders,  1  surface  grinder  and  43  tools. 
Shop  floor  #2  had  3  vertical  machining  centers  and  35  tools.  A 
profit  margin  of  35%  of  normal  production  cost  was 
specified. 

Once  the  production  orders  were  placed  in  the  design 
system  interface,  the  static  mediator  made  repeated  broadcast 
requests  to  each  shop  floor  involved.  Since  each  offered  a 
different  cost,  the  static  mediator  applied  a  final  integration  of 
plans  to  determine  the  best  one  to  manufacture  the  products. 
The  final  schedule  data  showed  that  1 90  parts  were  produced 
in  Shop  floor  #1  and  1 10  parts  in  Shop  floor  #2.  Shop  floor  #2 
only  manufactured  EMI  housing  and  Guide  products.  The 
Bearing  cover  product  was  only  produced  at  Shop  floor  #1. 

Table  1  shows  combined  performance  metrics  of  Average 
Flow  Time  (AFT),  Maximum  Completion  Time  (MCT),  and 
Transportation  Cost  (TC)  for  both  shop  floors.  Only  8  jobs  in 
all  were  tardy. 


Table  1:  Experiment  Results 


:  AFT 
(min.) 

MCT 
(min.) 

TC  ($) 
(Max.)  (Min.) 

414.06 

7164.81 

50.21  25.39 
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The  forecasting  simulation  proved  to  be  a  robust  system 
for  adjusting  and  enhancing  performance.  The  forecasting  was 
triggered  8  times  through  the  entire  planning  process,  as 
shown  in  Figure  6. 


r 

— 0—  Promis 

sory 

— 0—  Forecast 

1         2        3        4        5        6        7  8 
Simulation  Triggering 

Figure  6:  Triggering  Thresholds 


7.0  CONCLUSION 

A  learning  mechanism  for  identifying  multi-agent-based 
manufacturing  system  organizational  knowledge  and  selective 
interaction  propagation  from  emergent  system  behavior  has 
been  developed.  This  mechanism  enhances  coordination 
capabilities  by  minimizing  communication  and  processing 
overhead,  and  facilitates  distributed,  parallel  depth-first 
search.  Though  this  learning  model  has  been  implemented  in  a 
distributed  mediator  architecture  that  is  part  of  a  concurrent 
design  and  manufacturing  system,  it  is  generic  and  can  be 
applied  to  other  areas  as  well. 

A  learning  from  forecast-futures  technique  has  also  been 
developed  to  dynamically  adjust  distributed  schedules  and 
planning  in  a  multi-agent  manufacturing  system. 
Experimental  results  show  the  value  of  this  approach  for 
adjusting  and  enhancing  performance. 
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ABSTRACT 

Evolutionary  algorithms  provide  a  framework  for  studying  the  evolu- 
tion of  symbols  in  dynamic  environments.  This  paper  offers  a  brief 
review  of  related  efforts  in  evolutionary  computation  involving  the  in- 
vention of  a  language  of  descriptors,  followed  by  preliminary  experi- 
ments in  evolving  symbols  describing  an  environment  to  be  sent  be- 
tween a  sender  and  a  receiver.  The  results  indicate  that  as  more  preci- 
sion in  the  description  of  the  environment  is  offered,  the  greater  the 
possible  accuracy;  however,  the  greater  the  number  of  symbols,  the 
more  challenging  the  task  of  decoding  the  symbols.  Thus  there  is  a 
trade  off  between  precision  and  accuracy. 


KEYWORDS:  symbols,  communication,  evolutionary  computa- 
tion 

1.  INTRODUCTION 


vival.  Under  the  assumption  that  such  behavioral  characteristics 
are  heritable,  the  logical  outcome  is  a  series  of  individuals  that 
are  successively  better  predictors  of  regularities  in  their  envi- 
ronment. 

The  search  for  underlying  regularities  in  nature  is  also  pri- 
mary to  the  scientific  method.  Indeed,  there  is  a  fundamental 
connection  between  the  process  of  evolution  and  the  scientific 
process  of  discovery  [3]  (Figure  1).  In  nature,  individual  organ- 
isms serve  as  hypotheses  concerning  the  logical  properties  of 
their  environment.  Their  behavior  is  an  inductive  inference  con- 
cerning some  as  yet  unknown  aspects  of  that  environment.  Va- 
lidity is  demonstrated  by  their  survival.  Over  successive  gen- 
erations, organisms  becomes  better  predictors  of  their  surround- 
ings. 


Model 
hypothesis 


Deductive 


manipulation 


Living  organisms  are  predictors.  All  evolved  biota  continually 
predict  features  of  their  surroundings.  The  essential  criterion  of 
prediction  is  to  minimize  the  cost  of  being  surprised  [1].  Those 
that  fail  to  predict  with  sufficient  accuracy  or  in  a  timely  manner 
often  face  lethal  consequences.  The  keystone  of  prediction  in 
evolutionary  systems  may  be  taken  even  more  broadly  as  a  req- 
uisite component  of  all  intelligent  systems:  Intelligence  may  be 
defined  as  the  capability  of  a  system  to  adapt  its  behavior  to 
meet  its  goals  in  a  range  of  environments  [2].  By  consequence, 
intelligent  behavior  requires  a  composite  ability  to  predict  one's 
environment  coupled  with  a  translation  of  each  prediction  into  a 
suitable  response  in  light  of  a  given  goal  [3].  Without  predic- 
tion, adaptation  becomes  impossible. 

In  nature,  adaptation  is  accomplished  through  evolution,  a  two- 
step  process  of  random  variation  and  selection  [4].  Individuals 
compete  for  resources  in  a  finite  arena.  Natural  selection  sto- 
chastically eliminates  those  individuals  that  do  not  acquire  suf- 
ficient resources.  The  primary  purposeful  goal  imbued  into  all 
living  system  is  therefore  survival.  The  recognition  of  repeated 
patterns  in  natural  circumstance  with  appropriate  reactions  (i.e., 
allocation  of  available  resources)  leads  to  greater  rates  of  sur- 
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Figure  1.  The  scientific  method.  Unknown  aspects  of  the 
environment  are  estimated.  Data  are  collected  in  the  form 
of  previous  observations  or  known  results  and  combined 
with  newly  acquired  measurements.  After  the  removal  of 
known  erroneous  data,  a  class  of  models  of  the  environ- 
ment that  is  consistent  with  the  data  is  generalized.  This 
process  is  necessarily  inductive.  The  class  of  models  is  then 
generally  reduced  by  parameterization,  a  deductive  process. 
The  specific  hypothesized  model  (or  models)  is  then  tested 
for  its  ability  to  predict  future  aspects  of  the  environment. 
Models  that  prove  worthy  are  extended,  or  combined  to  form 
new  hypotheses  that  carry  on  a  "heredity  of  reasonableness" 
[2].  This  procedure  is  iterated  until  a  sufficient  level  of  cred- 
ibility is  achieved  (from  [2]). 
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These  surroundings  include  other  organisms,  and  thus  com- 
munication between  individuals  can  affect  the  likelihood  of  sur- 
vival. Communication  requires  a  language,  a  sequence  of  sym- 
bols transmitted  between  a  sender  and  a  receiver.  The  symbols 
can  be  transmitted  through  any  of  the  senses.  Common  symbols 
in  biota  include  threat  postures,  mating  dances,  grunts  and 
whistles  (or  other  sounds),  disagreeable  taste,  and  so  forth.  It  is 
of  interest  to  speculate  on  the  evolution  of  such  descriptive  sym- 
bols and  sequences  of  symbols. 

This  paper  offers  some  speculations  on  the  evolution  of  sym- 
bols as  observed  in  recent  efforts  in  evolutionary  computation. 
The  paper  does  not  purport  to  provide  a  suitable  literature  re- 
view of  efforts  to  explain  the  evolution  of  symbols  in  humans, 
other  primates,  or  indeed  any  other  natural  system.  Readers  in- 
terested in  these  areas  are  referred  to  [5,6]  and  others. 

2.  EVOLUTIONARY  COMPUTATION, 
SYMBOLS,  AND  COMMUNICATION 

2.1  Background 

Evolutionary  computation  concerns  efforts  to  simulate  evolu- 
tion on  a  computer  (or  other  device).  Efforts  in  simulated  evolu- 
tion date  back  to  the  early  1950s  and  1960s,  with  three  main 
lines  of  investigation  proceeding  currently:  evolution  strategies, 
evolutionary  programming,  and  genetic  algorithms.  For  a  re- 
view of  these  methods,  see  [2],  [7]. 

Within  evolutionary  computation,  there  have  been  efforts  made 
to  study  the  evolution  of  symbols  and  communication.  Werner 
and  Dyer  [8]  used  artificial  neural  networks  to  represent  the  be- 
havior of  artificial  organisms  in  the  context  of  finding  mates. 
"Females"  were  given  the  ability  to  see  males  and  emit  sounds. 
"Males"  were  blind,  but  could  hear  signals  from  females.  Over 
successive  generations,  starting  from  randomly  configured  net- 
works, the  simulation  generated  nets  that  exhibited  an  increas- 
ing effectiveness  in  mate  finding,  as  well  as  the  evolution  of 
dialects  that  promote  the  development  of  "subspecies."  Levin 
[9]  used  a  genetic  algorithm  approach  to  study  the  evolution  of 
understanding  between  artificial  agents  having  various  "inter- 
nal states"  and  "observable  features."  Fitness  was  increased  for 
greater  understanding  of  the  internal  states  by  other  observed 
agents.  Figure  2  offers  a  flow  chart  of  Levin's  procedure. 

Whereas  [8]  and  [9]  were  designed  specifically  to  examine 
the  evolution  of  symbols1,  other  evidence  of  the  evolution  of 
symbols  can  be  found  by  reexamining  experiments  in  the  iter- 
ated prisoner's  dilemma  [10]. 

2.2  The  Iterated  Prisoner's  Dilemma  and  the 
Evolution  of  Symbols 

The  prisoner's  dilemma  is  an  easily  defined  nonzero-sum,  non- 
cooperative  game.  The  term  nonzero-sum  indicates  that  what- 
ever benefits  accrue  to  one  player  do  not  necessarily  imply  similar 
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Figure  2.  Flowchart  of  procedure  in  [9]  to  evolve  commu- 
nication based  on  how  well  other  agents  understand  the  of- 
fered signals  regarding  internal  states  (from  [9]). 

penalties  imposed  on  the  other  player.  The  term  noncooperative 
indicates  that  no  preplay  communication  is  permitted  between 
the  players.  Typically,  two  players  are  involved,  each  having 
two  alternative  actions:  cooperate  or  defect.  Cooperation  im- 
plies increasing  the  total  gain  of  both  players;  defecting  implies 
increasing  one's  own  reward  at  the  expense  of  the  other  player. 
The  optimal  policy  for  a  player  depends  on  the  policy  of  the 
opponent. 

The  general  form  of  the  game  is  offered  in  Figure  3.  The  game 
is  conducted  over  a  series  of  trials.  If  information  regarding  prior 
plays  can  affect  the  decision  making  in  future  plays,  learning 
can  take  place  and  even  though  defection  is  the  dominant  move 
in  any  single  trial,  cooperative  behavior  can  evolve. 

Following  the  seminal  efforts  of  Axelrod  [11],  Fogel  [10] 
evolved  finite  state  machines  to  play  the  iterated  prisoner's  di- 
lemma. The  initial  interest  was  to  determine  if  population  size 
had  any  association  with  the  speed  at  which  a  population  evolved 
mutually  cooperative  behavior.  The  results  were  essentially  nega- 
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Figure  3.  The  general  form  of  the  payoff  function  in  the  prisoner's 
dilemma,  where  Yj  is  the  payoff  to  each  player  for  mutual  coop- 
eration, Y2  is  the  payoff  for  cooperating  when  the  other  player 
defects,  y3  is  the  payoff  for  defecting  when  the  other  player  co- 
operates, and  Y4  is  the  payoff  for  mutual  defection.  An  entry 
(a,P)  indicates  the  payoffs  to  the  players  A  and  B,  respectively. 
The  constraints  2yj  >  y2  +  Y3  and  Y3  >  Yi  >  Y4  >  Y2  ^  typical. 

tive:  population  size  did  not  appear  strongly  correlated  with  the 
number  of  generations  required  to  evolve  cooperation.  Popula- 
tion sizes  of  50,  100,  250,  500,  and  1000  parents  were  consid- 
ered. In  each  case,  across  several  trials  (up  to  30),  cooperation 
always  evolved.  Figure  4  depicts  one  evolved  finite  state  ma- 
chine from  these  experiments  for  a  population  size  of  50. 

It  was  of  interest  to  compare  the  behavior  that  occurs  when 
finite  state  machines  from  independent  trials  play  each  other  to 
that  observed  when  they  play  copies  of  themselves.  Fogel  [10] 
(also  see  [2])  reported  that  when  each  finite  state  machine  cho- 
sen from  a  trial  at  any  population  size  played  against  itself,  it 
always  eventually  cycled  into  mutual  cooperation.  But  when  it 
played  against  a  machine  selected  from  a  different  independently 
evolved  population,  it  was  equally  likely  to  cycle  into  alternat- 
ing cooperation  and  defection  (Figure  5). 

It  is  speculative  but  reasonable  to  presume  that  each  popula- 
tion of  finite  state  machines  evolved  initial  sequences  of  sym- 
bols that  form  patterns  that  can  be  recognized  by  other  machines. 
This  allows  for  the  identification  of  machines  that  will  tend  to 
respond  to  cooperation  with  cooperation.  The  specifics  of  such 
sequences  will  vary  by  trial  and  may  be  as  simple  as  merely 
cooperating  on  the  first  move.  But  the  specifics  are  generally 
unimportant.  When  two  machines  from  separate  trials  played 
against  each  other,  the  resulting  behavior  could  vary  because  no 
pattern  of  initial  symbols  was  recognized  by  either  player.  Thus 
it  would  be  improper  to  describe  the  machines  as  being  "coop- 
erative" or  "noncooperative"  as  such  descriptions  require  quali- 
fication with  regard  to  the  circumstances  in  which  they  were 
evolved  and  the  expected  behavior  of  the  other  players.  Each  of 


c.c/c 

D.C/C 
D,D/D 


I — S  -  Start  Stats 
C  -  Cooperate 
D  •  Delect 

Figure  4.  The  best  evolved  finite  state  machine  after  25  genera- 
tions for  a  population  size  of  50  parents  from  [10].  If  this  ma- 
chine were  played  against  itself,  it  would  eventually  cycle  into 
state  1,  and  always  cooperate  (C). 

the  machines  tested  for  the  results  presented  in  Figure  5  was 
cooperative  in  its  naturally  evolved  population. 

3.  EXPERIMENTS 

Of  potential  interest  is  the  manner  in  which  symbols  are  invented 
on-the-fly  in  light  of  competition  from  other  extant  individuals. 
Two  simple  experiments  have  been  conducted  to  initiate  an  in- 
vestigation into  this  process.  An  environment  is  sensed  and  en- 
coded by  an  individual  into  a  binary  string  of  some  length.  This 
string  is  then  sent  to  another  individual  that  must  decode  the 
string  and  recreate  the  sensed  environment  or  predict  the  next 
state  of  the  environment.  The  quality  of  communication  and 
encoding/decoding  is  judged  in  terms  of  the  correspondence 
between  the  desired  response  of  the  receiver  and  the  actual  re- 
sponse. The  binary  encoding,  however,  is  made  evolvable  in  that 
the  number  of  bits  used  to  describe  the  environment  is  a  param- 
eter of  the  sender,  and  is  subject  to  random  variation. 

More  specifically,  in  the  first  trial,  a  population  of  250  sender- 
receiver  pairs  was  initialized.  Each  sender  and  corresponding 
receiver  took  the  form  of  partitioned  neural  networks,  with  the 
sender  using  threshold  functions  for  hidden  units  (Figure  6).  The 
sensed  input  was  a  sine  wave  sampled  100  times  over  a  single 
period.  The  task  was  to  encode  the  sensed  value  of  the  sine  wave, 
have  the  receiver  decode  the  value  and  recreate  the  sensed  value 
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Figure  5.  The  results  of  playing  the  iterated  prisoner's  dilemma  (a)  with  independently  evolved  machines  in  which  each  machine 
plays  against  a  machine  that  was  created  in  a  separate  trial  with  a  different  number  of  parents  in  the  population  and  (b)  when  each 
independently  evolved  best  machine  plays  against  itself.  The  subscripts  indicate  the  population  size.  Mutual  cooperation  is  always 
observed  when  finite  state  machines  (FSMs)  play  against  themselves,  but  not  when  they  play  separately  evolved  machines. 


%  over  the  100  samples.  All  senders  were  initialized  to  have  a  single 
hidden  node.  This  could  be  varied  by  mutation  with  a  0.25  prob- 
ability of  adding  a  node  (up  to  a  maximum  of  10  nodes)  and  a 
0.25  probability  of  deleting  a  node  (again  to  a  minimum  of  1 
node).  All  initial  connections  were  initialized  uniformly  over 
[-0.5,0.5]  and  were  mutated  using  Gaussian  perturbations  with 
mean  zero  and  a  variance  equal  to  the  mean  squared  error  of  the 
sender-receiver  pair  divided  by  the  number  of  degrees  of  free- 
dom in  the  pair.  Each  sender-receiver  pair  generated  a  single 
offspring,  which  was  evaluated,  and  then  selection  was  imposed 
to  probabilistically  retain  the  best  250  pairs  in  accordance  with 
competition  rules  in  [2].  This  process  was  iterated  for  400  gen- 
erations. Typical  results  are  shown  in  Figure  7a.  A  second  trial 
used  two  successive  inputs  from  the  sine  wave  (jc[M],  x[t])  and 
required  the  receiver  to  predict  the  next  sample  (x[M-l]).  Typical 
results  are  shown  in  Figure  7b. 

A  general  pattern  covers  typical  results  in  both  experimental 
designs.  There  is  a  gradual  increase  in  the  number  of  evolved 


encoding  nodes  coupled  with  a  gradual  decrease  in  the  mean 
squared  error.  A  greater  number  of  encoding  nodes  provides 
greater  precision,  however  there  is  no  immediate  leap  to  the 
maximum  number  of  possible  nodes  ( 1 0  in  this  case,  which  could 
have  been  invented  as  early  as  the  10th  generation).  The  diffi- 
culty in  establishing  the  correct  encoding-decoding  mix  appears 
to  increase  with  the  number  of  encoding  nodes.  Thus  there  is  a 
trade  off  between  precision  and  accuracy,  and  the  appropriate 
symbolization  evolves  over  successive  generations. 

4.  CONCLUSIONS 

Symbols  evolve  because  they  are  useful.  Evolutionary  compu- 
tation offers  a  tool  to  study  the  evolution  of  useful  symbols  in  a 
variety  of  contexts.  Previous  efforts  have  identified  patterns  in 
the  evolution  of  symbols  both  in  sending  messages  from  sender 
to  receiver,  and  in  abstracting  meaning  from  an  environment  in 
light  of  a  given  goal.  The  preliminary  results  offered  here  show 
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ENCODING  NODES 

Figure  6.  The  sender  network  in  the  evolutionary  experi- 
ments. The  input  is  a  sine  wave  with  100  samples  per  pe- 
riod. The  connections  from  the  input  node  to  the  encoding 
nodes  are  weighted  and  can  vary  with  Gaussian  mutation 
(zero  mean).  The  encoding  nodes  multiply  the  connection 
strength  by  the  input  value,  subtract  a  bias  term  (implicit) 
and  pass  the  result  through  a  hard  limiting  threshold  func- 
tion. If  the  value  is  greater  than  or  equal  to  zero,  a  1  is  trans- 
mitted, otherwise  a  0  is  transmitted.  The  receiver,  by  defini- 
tion, has  a  number  of  input  nodes  equal  to  the  encoding 
nodes  of  the  sender  and,  in  the  first  experiment,  must  essen- 
tial perform  the  inverse  operation  to  recreate  the  input  value 
of  the  sender.  In  the  second  experiment,  the  sender  has  two 
sensors  and  encodes  two  successive  values  of  a  sine  wave. 
The  receiver  must  in  turn  predict  the  next  value  of  the  sine 
wave.  The  number  of  encoding  nodes  is  simultaneously  sub- 
ject to  mutation  and  selection  along  with  the  connection 
strengths  and  bias  terms. 
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Figure  7.  The  progression  of  the  mean  number  of  encoding 
nodes  (and  thus  the  choice  for  symbolization)  taken  over  all 
surviving  sender-receiver  pairs  at  each  generation  and  the 
best  mean  squared  error  that  was  obtained,  (a)  Typical  re- 
sults for  encoding  and  decoding  a  value  from  a  sine  wave, 
(b)  Typical  results  for  encoding  and  decoding  the  last  two 
samples  of  a  sine  wave  in  order  to  predict  the  next  value. 
Note  that  although  the  probability  of  adding  nodes  was  equal 
to  0.25  there  was  no  immediate  jump  to  the  maximum  num- 
ber of  10  nodes.  Simply  adding  nodes  did  not  automatically 
lead  to  improved  performance  because  it  requires  simulta- 
neous evolution  of  the  appropriate  connection  weights.  Nev- 
ertheless, increased  precision  allowed  for  increased  accu- 
racy. 
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a  first  step  toward  describing  the  trade  off  between  the  degree  of 
precision  offered  by  a  particular  set  of  symbols  and  the  accu- 
racy that  can  be  obtained  when  a  receiver  must  decode  these 
symbols  into  a  meaningful  response.  If  the  results  serve  to  pro- 
mote further  thought  and  experimentation  in  this  direction  then 
they  will  have  fulfilled  the  author's  purpose. 
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6.  FOOTNOTE 

Related  efforts  in  evolving  symbolic  strings  representing  sensed 
environmental  conditions  can  be  found  in  [12,  13]. 
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ABSTRACT 

In  this  paper,  we  speculate  that  from  both  a  semiotic 
perspective  and  from  a  perspective  of  intelligence  as  given 
in  Albus'  Outline  of  a  Theory  of  Intelligence  [2],  we  may 
cast  some  light  upon  the  early  stages  of  the  evolution  of 
intelligence  by  a  simple,  biologically  plausible,  model  of 
neural  control  circuitry.  This  model  implies  that  the 
ability  of  neurons  to  act  as  controllers  is  an  inherent 
property  of  neurons  and  that  this  inherent  control  is  a 
form  of  fuzzy  control.  In  employing  this  model  to  solve  a 
well-known  benchmark  control  problem  (the  inverted 
pendulum  on  a  cart  balancing  problem),  the  learning 
needed  to  set  the  few  weights  (and  parameters)  was 
accomplished  using  a  genetic  algorithm. 

KEYWORDS:  artificial  neural  net,  average 
firing  rate,  evolution. 

INTRODUCTION 

"The  brain  uses  stereotyped  electrical  signals  to 
process  all  the  information  it  receives  and  analyzes.  The 
signals  are  symbols  that  do  not  resemble  in  any  way  the 
external  world  they  represent,  and  it  is  therefore  an 
essential  task  to  decode  their  significance."  These  words 
begin  Nicholls,  Martin  and  Wallace's  From  Neuron  to 
Brain.  [12]  With  this  quotation  this  author  began  a  paper 
which  appeared  in  the  1996  NIST  Conference  Proceedings 
-  Intelligent  Systems  :  A  Semiotic  Perspective  [8].  This 
paper  is  a  continuation  of  last  year's  work  and  discusses 
how  very  early  neural  signals  (symbols)  might  be  viewed 
from  a  semiotic  perspective.  SECTION  2  defines  and 
discusses  the  signs,  interpretants,  and  objects  in  a  context 
of  early  neural  signaling.  SECTION  3  contains  a 
discussion  of  why  these  early  signals  must  have  contained 
control  information  and  how  inherent  properties  of  the 
integrating  mechanism  of  neurons  effects  control.  In 
SECTION  4,  we  describe  the  progress  made  using  circuits 
consisting  of  two  input  neurons  and  one  output  neurons 
(Three  Neuron  Controllers  or  TNCs).  We  combined  two 
TNCs  each  of  whose  output  serves  as  input  to  yet  a  third 
TNC  (creating  a  Seven  Neuron  Controller  -  Figure  3)  and 
solved  the  Inverted  Pendulum  problem.  Further,  our 
Seven  Neuron  Controller  net  was  trained  using  a  genetic 


algorithm.  SECTION  5  contains  a  discussion  of  why 
TNC  control  is  analogous  to  fuzzy  control.  This 
information  is  included  for  completeness  and  summarizes 
the  discussion  given  in  last  year's  paper  [8].  In  SECTION 
6,  we  reiterate  earlier  discussions  [3]  [4]  (speculations)  of 
how  elements  of  intelligence,  as  Albus  describes  them, 
could  have  evolved.  Further,  we  look  at  the  sign, 
interpretant,  object  triad,  and  show  that  it,  too,  admits  a 
description  couched  in  terms  of  the  evolution  of  a  simple 
neural  model.  Section  7  summarizes  this  paper. 

SECTION  2 

An  article  in  the  March-April  issue  of  the 
American  Scientist  [9]  (the  Sigma  Xi  magazine)  entitled 
The  Origin  of  Animal  Body  Plans  discusses  the 
"Cambrian  Explosion."  Our  speculation  affords  a 
plausible  commentary  on  this  phenomenon,  that  is,  the 
rapid  increase  in  the  number  and  variety  of  animals  on  the 
earth  which  occurred  about  530  million  years  ago.  This 
commentary  on  the  Cambrian  Explosion  will  be  given  in 
SECTION  7.  To  discuss  the  evolution  of  intelligence 
from  a  semiotic  perspective,  we  must  consider  the  neural 
activity  of  the  earliest  of  animals  possessing  neural  control 
mechanisms.  Thus,  from  the  semiotic  perspective,  since 
the  stereotyped  signals  are  virtually  identical  in  all  nerve 
cells  of  the  human  body  and  also  very  similar  in  different 
animals  [12],  we  assume  that  the  information  in  the  signs 
to  be  studied  is  contained  in  the  frequency  of  firing  of  the 
neural  signals  (action  pulses),  that  the  interpretant  is 
simply  the  neuron  that  integrates  the  neural  signals  from 
the  impinging  neurons,  and  that  the  object  which  caused 
the  sign  to  be  generated  is  either  an  external  event 
reported  to  the  organism  by  means  of  some  sensory  neural 
circuitry  or  a  signal  that  was  generated  by  some  internal 
mechanism  such  as  the  output  of  a  pacemaker  neuron. 


SECTION  3 

Let  us  focus  our  attention  on  a  very  elementary 
form  of  life,  a  coelenterate,  the  jellyfish.  These  creatures 
have  no  associative  memories  [13],  yet  they  can  swim  and 
attack  prey.  Thus,  creatures  exist  with  only  control  neural 
circuitry  and  no  associative  memory  neural  circuitry. 
Further,  we  see  no  biological  advantage  to  a  creature 
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possessing  associative  memory  neural  circuitry  but  no 
control  circuitry.  Thus  it  seems  plausible  to  conclude  that 
control  neural  circuitry  evolved  before  associative  memory 
neural  circuitry.  Certainly  one  neuron  signaling  a  second 
is  the  simplest  neural  net  of  all  possible  nets  (of  more  than 
one  neuron)  to  envision.  The  sending  neuron  in  a  two- 
neuron  circuit  can  control  the  receiving  neuron  by  either 
enhancing  its  firing  rate  (excitatory)  or  inhibiting  it 
(inhibitory).  Cortical  neurons,  because  of  the  random 
firings  of  the  massive  number  of  neurons  to  which  they 
are  connected,  possess  a  non-zero  average  firing  rate  [1]. 
It  seems  unlikely,  however,  that  sensory  or  motor  neurons 
of  very  early  organisms  did  have  a  non-zero  firing  rate.  It 
would  appear  that  the  ability  of  one  neuron  to  inhibit  the 
firing  of  another  must  have  evolved  to  inhibit  self-firing 
(pacemaker)  neurons,  or  after  the  evolution  of  circuits 
with  two  or  more  neurons.  Inhibiting  self-firing  or 
pacemaker  neurons  gives  rise  to  the  ability  to  generate 
rhythmic  or  cyclic  firing  patterns  [10].  Rhythmic  patterns 
are  used  to  control  the  heartbeating  and  swimming  of  the 
leech.  The  flow  of  oxygen  carrying  blood  eliminates  the 
need  for  oxygen  to  diffuse  through  an  animal's  body. 
Creatures  that  rely  on  diffusion  of  oxygen  are  of  necessity 
flat  [9]. 

Adding  a  third  neuron  to  a  two  neuron  circuit 
could  be  done  by  adding  the  third  so  as  to  place  three  in 
line,  or  adding  the  third  in  parallel  to  the  existent  sending 
neuron  so  that  the  receiving  neuron  has  two  inputs. 


Figure  1  A  Three  Neuron  Controller  (TNC) 

This  latter  circuit  is  pictured  in  Figure  1.  This  circuit 
allows  control  of  two  different  input  signals.  This  is  the 
property  of  neurons  which  we  assert  is  both  natural  and 
useful.  We  have  successfully  used  such  circuits  as 
controllers,  and  have  named  them  Three-Neuron 
Controllers  (TNC)s.  TNCs  have  been  used  to  control  the 


height  of  water  in  a  tank  at  constant  set  point  under 
conditions  of  random  inflow  [5]  and  to  back  a  truck  to  a 
loading  dock  [6].  In  the  height  of  water  in  a  tank  and  the 
truck  backing  problem,  two  neurons  represent  the  inputs 
of  these  problems,  while  the  third  or  output  neuron 
represents  or  codes  the  control  signal. 

In  Figure  1,  we  have  assumed  that  the  input 
neurons  (ml  and  m2  name  both  the  input  neuron  and  the 
value  of  its  output)  possess  some  average  firing  rate  called 
Aver  The  input  signal  that  the  two  input  neurons  receive 
(Observj)  is  on  [MinJ5  MaxJ  and  is  mapped  onto  [-1,1]  by 
the  following  formulas. 
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[Observj  -  Ave^ 
[Max.  -  Avej] 

If  Observ  >  Ave 

J  j  j 
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[Avej-Mirij] 

We  further  assume  that  the  weight  connecting 
the  input  neurons  to  the  output  neuron,  wl  1  and  wl2,  are 
constrained  to  be  on  [-1,1]  also.  Thus,  the  input  signals 
which  a(t),  the  output  neuron,  receives  are  on  [-1,1]. 

Thus  far,  we  have  studied  two  sets  of  non-linear 
differential  equations  which  describe  the  flow  of  activation 
in  Figure  1  [7]  [8].  Both  sets  of  equations  constrain  the 
output  neuron  a(t)  to  be  on  (0,1).  The  first  set  studied  was 
the  author's  dissertation  equations  and  were  called  the  RX 
equations.  Most  recently  we  have  worked  with 
McClelland  and  Rumelhart's  iac  (interactive  activation 
and  competition  and  which  we  call  the  ENIAC  Equations 
for  extended  iac)  [11]  translated  from  difference  to 
differential  equation  form.  Their  solutions  are  given  in 
Equation  3  when  the  weighted  sum  of  its  inputs  m,  and  m2 
is  greater  than  zero,  and  Equation  5  when  this  sum  is 
below  zero. 


a(t)  -  a 
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Observe,  in  Equation  3  when  t  =  0,  a(t)  =  a,,,  and  when  t  is 
infinite,  a(t)  =  1.  In  Equation  4  at  t  =  0,  a(t)  is  also  = 
and  when  t  is  infinite,  a(t)  =  0.  The  output  neuron  a(t)  has 
a  resting  value,  a.  We  generate  the  actual  control  output 
by  subtracting  from  a(t)  the  resting  value  and  then  scaling 
the  result  onto  the  range  of  possible  control  values.  For 
example  in  the  truck  backing  problem,  the  steering  wheel 
is  limited  to  plus  or  minus  15  degrees  and  the  resting 
value  of  a(t),  a,  was  assumed  to  be  0.5.  Thus: 


Control  Signal  =  (a(t)  -  0.5)  *  30 


(5) 


Note  that  when  a(t)  =  0,  the  control  signal  is  -15  degrees, 
and  when  a(t)  =  1,  the  control  signal  is  +15  degrees  and 
all  values  inbetween  are  attainable. 

SECTION  4 

Other  problems  solved  by  extending  to  three 
TNCs  are:  to  back  a  cab  with  a  trailer  (truck  and  trailer)  to 
a  loading  dock  [7],  and  more  recently  (discussed  below)  to 
control  (keep  balanced)  an  inverted  pendulum  positioned 
on  a  cart  and  subject  to  random  forces  which  tend  to 
unbalance  it.  Both  the  truck  and  trailer  and  the  inverted 
pendulum  problems  have  more  than  two  inputs.  For  more 
information  on  the  truck  and  trailer  case,  the  reader  is 
referred  to  reference  [7]  In  the  Inverted  Pendulum  (IP) 
problem,  pictured  in  Figure  2,  there  are  four  inputs  x,  and 
x,  the  position  and  velocity  of  the  cart  and  9  and  G,  the 
angle  of  the  pendulum  with  the  vertical  and  the  angular 
velocity  of  the  angle.  The  task  is  to  keep  the  pendulum 
upright,  and  the  cart  centered  over  x  =  0  while  the  cart 
and  pendulum  are  subject  to  random  disturbing  forces. 
Control  is  via  a  motor  on  the  cart.  The  control  task  is  to 
calculate  the  appropriate  torque  to  the  motor. 

The  architecture  of  the  neural  net  used  to  solve  the  IP 
problem  is  displayed  in  Figure  3.  This  is  the  first  problem 
where  we  needed  to  train  the  weights.  It  seems  apparent 
that  in  the  IP  problem,  the  control  of  the  pendulum 
must  be  of  higher  precedence  than  control  of  the  cart's  x 
position.  The  pendulum  must  not  fall  and  hence,  it  must 
be  controlled  first.  Only  when  it  is  stable,  does  it  become 
possible  to  "walk"  the  cart  back  to  the  position  x  =  0. 

We  designed  a  genetic  algorithm  determine  the 
appropriate  weights  for  the  system.  The  only  weight 
changed  was  the  one  from  the  x  associated  TNC  to  the 
final  control  node.  The  values  of  this  weight  generated  by 
the  genetic  algorithm  was  .13.  This  reflects  the  fact  that  it 
is  necessary  to  first  control  the  pendulum. 


Figure  2  The  IP  Problem 


Figure  3  The  Seven  Neuron  Controller 

SECTION  5 

We  have  compared  all  of  our  neural  control 
answers  to  those  generated  by  a  fuzzy  controller,  with  the 
exception  of  the  IP  problem.  The  results  produced  by  all 
controllers  were  very  similar.  It  is  easy  to  see  the 
mathematical  reason  for  this.  The  first  two  of  the 
problems  with  which  we  worked  have  had  monotone 
control  surfaces.  By  this  we  mean  that  if  one  of  the  inputs 
is  held  fixed  at  any  point  on  its  domain  of  [-1,1],  and  the 
other  input  value  allowed  to  increase  from  its  minimum 
value  of  -1,  up  to  its  maximum  value  of +1,  the  control 
surface  will  either  increase  (monotone  increasing)  or 
decrease  (monotone  decreasing)  over  the  complete  range 
of  input  values.  This  is  true  of  both  inputs.  But  it  is  not 
true  for  all  possible  problems.  Our  model  accepts  the 
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normalized  inputs,  uses  them  as  constant  parameters  in 
the  function  to  be  integrated,  and  integrates  over  time  up 
to  a  fixed  upper  limit  of  time  (which  happens  to  be 
different  for  each  model,  RX  or  ENIAC).  Different 
functions  are  used  as  integrands  depending  on  whether  the 
input  parameters  are  negative  or  positive.  However,  in 
each  case  the  integral  is  also  a  monotone  increasing  or 
decreasing  function  of  the  inputs.  That  is,  the  greater  the 
positive  input  (up  to  its  maximum  of  1)  the  larger  will  be 
the  value  of  the  integral  (and  hence  control  signal).  The 
larger  in  absolute  value  (closer  to  -1)  the  negative  input, 
the  smaller  will  be  the  value  of  the  integral.  The  smallest 
value  possible  for  the  integral  is  zero.  By  Equation  5 
above,  when  a(t)  (the  value  of  the  integral)  is  zero,  then 
the  control  signal  will  be  as  large  in  the  negative  direction 
as  is  possible. 

Note  that  none  of  the  solutions  calculated  are  in 
any  way  optimal.  They  are  just  solutions  to  a  problem  in  a 
dense  set  of  solutions.  There  are  an  infinite  number  of 
ways  the  fuzzy  membership  functions  can  be  selected; 
also,  one  may  chose  from  an  infinite  number  of  shapes  for 
each  membership  function.  Further,  a  wide  variety  of 
defuzzification  procedures  may  be  employed.  What  we 
have  observed  is  that  each  reasonable  method  of  solution 
that  we  have  studied  thus  far  has  produced  solutions 
"near"  each  other  in  the  solution  space;  that  is,  they 
successfully  control  the  process.  Figure  4  shows  why  the 
fuzzy  solutions  are  closely  approximated  by  the  neural 
techniques.  For  each  fuzzy  rule,  there  is  a  corresponding 
integral  which,  because  of  the  monotonicity  of  the  control 
surface,  gives  a  similar  result  for  a  control  value,  as  does 
the  fuzzy  rule  that  it  emulates.  The  inputs  to  the  integrals 
are  crisp,  as  are  the  integrals  upper  limit.  How  can  we 
claim  that  the  process  represents  a  fuzzy  form  of  control? 
This  claim  is  made  in  recognition  of  the  infinite  number 
of  functions  that  could  be  used  as  integrand  and  give 
similar  answers.  Also,  slight  variation  in  the  upper  limit 
of  the  integral  would  effect  the  solutions  very  little.  Thus, 
our  fuzziness  is  in  function  space. 


SECTION  6 

Figure  5  displays  what  Albus  [2]  called  an 
Element  of  Intelligence.  In  recent  articles  [3]  [4],  the 
author  suggested  that  Albus'  model  [2]  admits  an 
evolutionary  explanation  as  follows.  The  sensors  and 
actuators  evolved  first.  These  correspond  to  simple 
sensory  neurons  directly  connected  to  motor  neurons. 
Note  that  these  circuits  at  first  involve  no  memory.  As  the 
evolutionary  process  continued,  refined  sensory  processing 
and  behavior  generation  abilities,  which  involved 
memory,  would  mean  greater  success  from  a  biological 
standpoint  to  the  organism  possessing  them.  The 


functions  performed  by  the  World  Models  and  Value 
Judgement  elements  could  further  enhance  survival  of  the 
possessing  organism.  The  referenced  articles  [3]  [4] 
further  suggested  that  the  initial  components,  the  sensors 
and  actuators,  could  be  emulated  by  a  TNC,  with  the  input 
neurons  playing  the  role  of  the  sensors,  and  the  output 
neuron  the  role  of  the  actuator. 

We  suggest  that  this  model  appears  logical  from 
the  semiotic  perspective  also.  As  stated  in  SECTION  2, 
the  sign  of  the  semiotic  triad  is  represented  by  the 
frequency  of  firing  of  input  neurons.  Initially  the 
interpretant  is  simply  a  neuron  that  integrates  the  inputs 
of  its  impinging  neurons.  In  order  not  to  distort  its 
message,  the  object  of  the  triad,  existing  perhaps  in  the 
external  world  or  perhaps  internally  as  a  self-generating 
neuron,  should  map,  in  a  linear  fashion,  the  degree  to 
which  the  object  is  to  be  considered  by  the  interpretant. 
From  a  practical  standpoint,  pressure  sensitive  neurons  are 
linear  over  a  wide  range  of  inputs  [10].  That  is,  the 
frequency  of  firing  of  the  sensory  neurons  in  the  skin 
which  measure  pressure  applied,  may  be  represented  as  a 
constant  times  the  pressure  applied.  Non-linearities  are 
introduced  by  the  integration  accomplished  by  the 
interpretant.  From  a  biological  standpoint,  this  sequence 
appears  plausible.  The  sign  is  a  direct  measure  (i.e. 
directly  proportional)  to  the  magnitude  of  the  object.  The 
interpretant  may  apply  a  gain  (introduced  by  the  non- 
linearity)  to  the  sign  representing  the  object  as  best  suits 
the  survival  of  the  using  organism.  The  reader  should 
keep  in  mind  that  the  system  being  discussed  contains  no 
memory. 
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Figure  4  Neural  and  Fuzzy  Rules 
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SECTION  7 

Viewed  as  the  evolutionary  beginnings  of  one  of 
Albus'  elements  of  intelligence,  or  from  a  semiotic 
standpoint,  or  simply  from  the  standpoint  of  how 
intelligence  might  have  evolved,  this  model  seems 
relevant.  Neurons  do  integrate  their  inputs,  and  we  have 
shown  that  for  a  large  class  of  problems,  and  for  two 
different  functions,  that  this  integration  can  yield  useful 
results.  Since  neural  circuitry  evolved  in  the  Cambrian 
Period  it  does  not  therefore,  seem  unreasonable  to  the 
author,  to  assume  that  this  property  of  neurons  increased 
the  survivability  of  those  organisms  possessing  neurons  to 
the  extent  that  this  property  of  neurons  might  have 
contributed  to  the  Cambrian  Explosion. 

This  paper  is  highly  speculative.  Speculation 
must  be  constrained  in  two  ways  as  stated  by  Smith  and 
Szathmary  [14].  "First  each  event  must  be  explained  in  a 
way  that  is  consistent  with  the  general  theory  of 
evolutionary  change,  the  theory  of  natural  selection. 
Second,  an  adequate  account  of  the  origin  of  any  system 
must  explain  the  peculiarities  of  the  system  as  it  exists 
today..." 

The  examples  of  jellyfish  has  led  us  to  believe  that  control 
neural  circuitry  existed  before  associative  memory  neural 
circuitry.  In  the  interest  of  enhancing  our  understanding 
and  use  of  ANNs  as  associative  memories,  it  would  appear 
that  a  serious  effort  should  be  undertaken  to  understand 
how  memory  abilities  evolved  from  control  abilities. 


Figure  5  Albus'  Elements  of  Intelligence 
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ABSTRACT 

Evolvable  hardware  addresses  hardware  that  self-organizes/ 
reconfigures  under  the  guidance  of  evolutionary  mechanisms. 
Some  experiments  in  evolving  at  transistor  level  are  briefly 
presented  and  the  perspective  of  transistors  as  functional 
approximators  is  suggested.  In  a  broader  context  one  analyses 
approaches  to  evolvable  hardware  specifying  the  level  of 
granularity  at  which  evolution  will  operate,  in  accordance  with  the 
level  of  design  abstractions  in  the  modeling  hierarchy:  primitive, 
functional  and  behavioral  levels  in  simulated  circuits,  and 
transistor,  subcircuit  and  high  level  function  in  hardware. 
Comments  are  made  linking  the  role  of  Automatically  Defined 
Functions  in  hardware  evolution,  the  process  of  achieving 
higher/coarser  levels  of  granularity  in  the  semiotic  perspective,  and 
the  evolution  of  modularity  of  organismic  designs.  Finally,  it  is 
suggested  testing  the  effect  of  changing  levels  of  granularity  during 
hardware  evolution. 

KEYWORDS:  evolvable  hardware,  granularity,  evolvability 

1.  EVOLVABLE  HARDWARE 

Evolution  appears  to  be  nature's  solution  to  design. 
Being  able  to  replicate  such  a  capability  in  an  artificial 
system  would  offer  tremendous  insights  into  ourselves  and 
a  powerful  tool  for  building  adaptive,  intelligent  systems. 

From  the  perspective  of  space  exploration,  empowering 
spacecraft  with  adaptive,  intelligent  capabilities  is 
invaluable  for  autonomy.  Adaptive  features  are  needed  to 
cope  with  the  uncertainty  of  spacecraft  remote  operating 
conditions,  performing  totally  unexpected  functions,  and  for 
fault-tolerance.  The  on-board  computer  needs  to  be  able  to 
solve  problems  for  which  solutions  were  not  specified  on 
ground,  and  command  the  spacecraft  to  adapt  to  new 
situations.  For  adaptive,  versatile  spacecraft,  electronic 
hardware  must  posses  the  capability  to  reconfigure,  or 
moreover,  to  self-reconfigure,  as  needed. 

Evolvable  hardware  (EHW)  is  adaptive  hardware  that 
reconfigures  under  the  control  of  an  evolutionary  algorithm 
[1].  Extrinsic  EHW  refers  to  evolution  in  a  software 
simulation  using  models  of  the  hardware  behavior, 
downloading  the  configuration  of  the  best  evolved 
architecture  to  programmable  hardware.  In  intrinsic 
hardware  (to  which  most  of  the  following  discussion 
applies)  configuration  bits  are  iteratively  downloaded  to 


hardware,  evaluating  a  degree  of  adaptation/fitness  by 
observing  the  behavior  of  the  real  hardware. 

Hardware  evolution  is  performed  through  a  succession  of 
changes  of  elementary  cell  functions  and  cell  inter- 
connectivity  pattern,  thus  obtaining  increasingly  more  fit 
configurations  until  a  target  functionality  is  reached.  As  it  is 
the  case  in  nature,  evolution  results  in  individuals  that  are 
increasingly  more  adapted  to  their  environments,  and  can 
change  themselves  to  match  changes  in  environments  and 
modifications  of  their  own  goals.  Unlike  in  nature, 
evolution  in  silicon  has  the  advantage  that  could  be 
extremely  rapid,  with  millions  of  generations  of  "living" 
circuits  evaluated  in  only  a  few  seconds. 

Hardware  evolution  can  be  seen  as  an  on-chip  search  for 
the  circuit/configuration  whose  behavior  is  closest  to  the 
required  one  (e.g.  gives  best  performance/adaptation  to  the 
environment).  The  suitability  for  a  parallel  hardware 
implementation  of  evolvable  hardware,  with  multiple 
"islands"  of  concurrently  evolving  circuits  on  the  same  chip, 
or  in  a  multi-chip  or  stacked  configuration  is  very  attractive. 

Evolvable  hardware:  a  fully  parralel  process 
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Fig.   1 .  Parallel  implementation  for  evolvable  hardware 

The  granularity  of  hardware  building  blocks  for  those 
attempting  intrinsic  evolvable  hardware  is  currently 
influenced  by  the  availability  of  certain  programmable 
devices.  The  paper  presents  results  of  simulated  evolution  at 
transistor  level  and  discusses  on  the  role  of  the  level  of 
granularity  and  evolvability. 
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2.  EXPERIMENTS  IN  EVOLVING  CMOS 
CIRCUITS  AT  TRANSISTOR  LEVEL 

Successful  evolution  has  been  reported  in  simulations 
(analog  [2],  [3]  and  digital  [4]  )  and  in  real  hardware  [5] 
[6].  In  the  intrinsic  EHW  perspective,  the  focus  has  been 
on  using  commercially  available  FPGAs  (Field 
Programmable  Gate  Arrays)  but  custom  designs  of  chips 
using  higher-level  functional  blocks  are  also  reported  [6]. 
A  collection  of  papers  dedicated  to  the  subject  is  [18]. 

In  here  the  distinct  focus  of  attention  is  on  issues  related 
to  designing  the  reconfigurable  part  of  an  evolvable 
CMOS  chip.  The  choice  is  motivated  by  the  fact  that 
CMOS  technology  is  the  basis  of  today's  microelectronics 
industry,  and  NMOS/PMOS  transistors  are  the  elementary 
components  of  both  analog  and  digital  designs. 

One  can  look  at  MOS  transistors  from  the  perspective  of 
function  approximators.  Function  approximation  has 
recently  been  approached  with  computational  intelligence 
techniques,  demonstrating  general  approximation 
capabilities  for  structures  of  neural  networks  [7]  [8]  and 
fuzzy  systems  [9]  [10].  As  hardware  implementations 
ultimately  rely  on  silicon,  a  general  functional 
approximator  (FA)  implemented  in  hardware  will 
ultimately  be  relying  on  (e.g.)  MOS  transistors. 

A  set  of  experiments  was  performed  to  investigate 
evolution-related  issues  at  transistor  and  simple  sub-circuit 
level.  The  objective  was  to  evolve  a  circuit  that  provided 
a  bell-shaped  response  when  the  input  increased  linearly. 
This  response  can  be  obtained  for  example  with  the  circuit 
in  Fig.  2,  with  one  input  kept  at  constant  voltage  and  the 
other  increasing  linearly.  The  evolution  was  performed  on 
simulated  circuits  (using  SPICE).  Constraints  were 
imposed  on  the  mechanism  generating  the  circuits  such 
that  all  the  circuits  produced  were  SPICE  simulatable  (this 
differs  than  Koza's  experiments  where  non-simulatable 
circuits  are  eliminated  by  evolution  (e..g  in  [11]).  A 
limitation  existed  also  on  the  number  of  circuits  evaluated, 
which  was  much  smaller  than  those  reported  by  Koza 
(~104  compared  to~107). 


Gaussian  function  circuit 

Vinl  Vin2 


Fig.  2  A  circuit  producing  a  bell-shaped  response 

In  the  first  set  of  experiments  the  circuit  topology 
was  considered  known,  and  evolution  concerned  two  types 


of  parameters:  transistor  channel  Width  and  Length. 
These  parameter  domains  are  discrete  (they  are  a  multiple 
of  the  feature  size),  and  for  this  case  a  total  of  8  possible 
widths  and  8  lengths  was  considered.  This  led  to  a 
(3+3)* 8=48  bit  coding  for  the  circuit.  Applying  a  Genetic 
Algorithm  on  a  population  of  50  individuals,  converged 
easily  to  good  solutions  (an  illustration  of  the  population 
after  about  200  generations  is  given  in  Fig.  3.  This  is 
explained  by  the  fact  that,  although  the  region  searched  is 
small  (~210  in  the  search  space  of  248  possible 
combinations),  many  combinations  are  good;  in  this  case 
the  topology  is  the  fundamental  factor  in  circuit  behavior. 
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Fig.  3  Response  of  a  population  of  49  circuits  after  200 
generations 

The  problem  becomes  complicated  when  one  tries  to 
evolve  the  topology.  Several  alternatives  were  tried, 
briefly  described  in  the  following: 

-  A  2D  transistor  array  was  considered,  which  resembled 
an  existing  FPGA  model  (Xilinx  6216).  The  code  for  each 
cell  specified  the  type  of  the  cell  which  could  be  a 
transistor  with  a  certain  orientation,  or  a  type  of  wire 
routing  (these  are  illustrated  in  Fig.  4).  A  2D  chromosome 
was  associated  to  the  array.  Only  evolution  by  GA  was 
attempted.  The  problem  appears  in  specifying  2D 
crossover,  determining  which  2D  zones  should  be 
swapped.  Crossover  must  consider  the  fact  that  the  new 
circuits  must  match  their  connections  at  cell  borders. 


Fig.  4  A  transistor  array  patterned  after  an  FPGA 

-  A  leveled  architecture  (with  a  matrix  arrangement  as  in 
Fig.  2)  where  components  on  a  given  level  were  from  a 
list  of  allowed  components  (see  Fig.  5).  Components 
were  low-level  subcircuits  (current  mirror,  differential 
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pair,  pair  of  transistors).  The  coding  was  such  that  the 
possible  connections  were  limited  to  those  that  made 
sense  (e.g.  the  Source  terminal  of  transistor  in  level  4 
could  not  connect  to  VDD). 


.ST  . 
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mmm&m 


ST  Separated  Transistors 
CM  Current  Mirror 
DP  Differential  pair 


Level  3 


Level  4 


CM 


Fig.  5   Components  allowed  at  different  levels 

The  experiments  can  be  interpreted  as  follows: 

•  When  the  topology  of  the  electronic  circuit  is  given, 
the  transistor  parameters  (in  this  case  PMOS/NMOS 
channel  Width  and  Length)  can  be  easily  obtained 
through  evolution. 

•  If  one  attempts  evolution  in  a  space  restricted  at  the 
size  of  the  human-designed  solution  (in  our  case  when 
the  circuits  were  limited  at  8  transistors)  it  was  not 
possible  to  evolve  a  circuit  from  specifications.  The 
methods  attempted  included  genetic  algorithms,  and  a 
search  based  on  orthogonal  arrays.  With  the 
representation  used,  the  search  space  appears  very 
much  as  a  flat  region,  with  a  singular  spike  for  the 
solution  circuit.  Its  neighbors,  with  genetic  code 
differing  even  one  bit  from  the  solution  had  extremely 
low  fitness;  i.e.  changing  even  one  connection 
between  2  transistors  had  a  dramatic  effect  in  circuit 
behavior.  This  is  possibly  a  consequence  of  the  search 
around  the  optimal  size  (minimal  number  of 
transistors)  solution.  If  the  circuit  would  have 
contained  redundant  circuitry  changes  in  those  regions 
would  have  produce  less  significant  effects. 

It  was  however  possible  to  evolve  the  topology  using 
genetic  programming  and  an  embrionic,  growth-based 
approach,  in  which  the  number  of  components  (size  of 
the  circuit)  was  not  restricted  (results  obtained  by  Koza's 
group,  not  published  yet).  The  first  set  of  simulations  led 
to  a  circuit  with  36  transistors. 

Evolving  on  a  structure  with  fixed,  minimal  number 
of  transistors  for  the  function  (for  which  a  solution 
circuit  is  already  known)  suffers  from  the  fact  that  any 
mutation  involving  a  connection/topology  change  leads 
to  a  dramatically  different  response.  One  possible  way  in 
which  this  can  be  alleviated  is  to  allow  gradual 


connections  (e.g.  connections  modeled  by  resistors  in  [0 
100G])  during  the  evolution.  Thus  may  be  architecture 
which  allows  gradual  transitions  between  topologies  are 
possible.  It  could  look  like  a  densely-connected  "sea  of 
transistors"  in  which  components  are  connected  by  a 
"fuzzy"  wire  (i.e.  with  values  in  [0,1]  rather  than  {0,1}), 
appearing  much  like  a  neural  network,  but  with  transistor 
instead  of  neurons.  The  graded  connection  would  have  a 
catalyst  role  during  evolution;  it  smoothens  the  search 
space  around  the  solutions  (in  this  respect  neural 
architectures  using  gradual  connections/weights  appear 
suitable  for  evolution). 


3.  HARDWARE  LEVELS 

GRANULARITY  AND  EVOLUTION 


OF 


A  fundamental  question  is  how  does  the  choice  of  a  level 
of  design  abstraction  influence  hardware  "evolvability"? 

(The  search/optimization  algorithm  may  be  not  that 
important:  the  "no  free  lunch"  theorems  establish  that  for 
any  algorithm,  any  elevated  performance  over  one  class  of 
problems  is  offset  by  performance  over  another  class  [12].) 
When  evolution  takes  place  directly  in  the  component  space 
the  choice  of  the  primitive  building  blocks  affects  the 
evolvability  (e.g.  evolving  a  NN  appears  easier  than 
evolving  a  circuit  with  transistors). 

The  choices  for  the  level  at  which  evolution  process 
could  operate  are  as  follows. 

For  evolving  simulated  circuits.  The  levels  of  design 
abstraction  in  the  modeling  hierarchy  are  [13]: 

•  Primitive  Devices  (Transistor  level  -  MOS,BJT,etc) 
represented  by  analytical  equations  or  tables 

•  Functional  Macromodels  (e.g.  Op.  Amp.  Level) 
derived  by  circuit  simplification,  circuit  build-up, 
symbolic  methods 

•  Behavioral  High  level  language  descriptions  - 
linear  and  nonlinear  mathematical  equations,  tables, 
etc. 

Evolution  can  be  made  at  a  certain  level  and  then  the 
design  converted  for  hardware  implementation  with 
appropriate  compilers  and  synthesis  tools. 

For  evolving  directly  in  hardware.  A  similar 
selection  of  level  of  primitives  is  available  in  a 
reconfigurable  structure  (specifying  what  should  be  the 
"building  blocks"): 

•  Transistor  level.  Some  simulations  were  discussed 
in  the  previous  section. 

•  Subcircuit  level.  This  is  the  Op.  Amp./digital 
gates  level.  Succesful  hardware  evolution  taking 
place  on  an  FPGA  chip  was  reported  by  Thompson 
[5].  Versatile  FPAA  (Field  Programmable  Analog 
Arrays)  are  still  missing,  but  are  expected  to  appear 
on  the  market  within  2-3  years. 
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•  High  level/  functional  level  One  way  of  coping  with 
difficulties  in  evolving  circuits  is  to  provide  high 
level  building  blocks,  filters,  modulators  in  the 
analog  domain,  or  adders,  MUXs  in  the  digital 
ones.  This  approach  is  followed  by  Higuchi  [6  ]. 

The  idea  of  modularity,  and  changing  granularity  during 
evolution  may  be  key  to  hardware  to  evolvability  (in 
intelligent  systems,  knowing  presumes  changing  resolution 
or  granularity  [14]).  In  the  biological  world  modularity  is 
most  likely  the  result  of  evolutionary  modifications  [15].  In 
the  evolvable  hardware  context,  Koza  has  indicated  the 
usefulness  of  Automatically  Defined  Function  (ADF)  for 
evolving  analog  electronic  circuits  (an  ADF  is  "a  function 
(subroutine,  module,  etc.)  that  is  dynamically  evolved 
during  a  run  of  genetic  programming  and  that  may  be  called 
by  a  program  that  is  concurrently  being  evolved")  [16].  In 
effect  the  idea  that  no  approach  to  automated  programming 
(and  consequently  hardware  evolution)  is  likely  to  be 
successful  on  non-trivial  problems  unless  it  provides  "some 
hierarchical  mechanism  to  exploit,  by  reuse  and 
parametrization,  the  regularities,  symmetries,  homogeneities, 
similarities,  patterns,  and  modularities  inherent  in  problem 
environments"  is  central  to  Koza's  book  [17]. 

When  choosing  representation  levels,  it  may  be 
interesting  to  consider  having  simultaneous  representations 
at  different  levels  of  granularity.  During  evolution  one  may 
switch  between  levels  of  granularity  depending  which  one 
offers  advantages  at  that  time. 

4.  CONCLUSION 

The  paper  presented  simulation  results  and  discussed 
issues  of  evolution  at  transistor  level.  While  parametric 
evolution  appears  easy,  topological  evolution  is  hard. 
Different  levels  of  granularity  are  available  for  evolvable 
hardware.  It  was  suggested  that  changing/switching  levels  of 
granularity  during  evolution  may  help. 
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Abstract 

This  paper  deals  with  rather  basic  implications  of  theorizing 
systems  which  are  able  to  learn.  Starting  from  the  fact  that  a 
unified  theory  of  information  is  still  missing  it  tries  to  sketch 
roughly  a  conceptualization  that  merges  the  concept  of 
semiosis  and  the  concept  of  evolutionary  systems.  This  shall 
provide  a  framework  for  future  research  into  every  kind  of 
information  processing  systems,  including  socio-technical 
systems. 

KEYWORDS:    Unified   Theory   of  Information,  semiosis, 
evolutionary  systems 

I.  The  Quest  for  a  Unified  Theory  of 
Information 

The  Second  Conference  on  the  Foundations  of  Information 
Science  (FIS  96)  taught  us  that  we  are  still  lacking  a  Unified 
Theory  of  Information  (UTI)  which  comprises  the  variety  of 
manifestations  of  information  processing  in  natural,  in  social, 
and  in  artificial  systems.  It  showed  at  the  same  time  that  the 
UTI  we  are  looking  for  may  be  well  based  upon  a  theory  of 
evolutionary  systems,  that  is  open  systems  which  are  capable 
of  self-organization  [1].  This  theory,  however,  is  to  be 
elaborated,  too. 

With  the  transition  from  System  Theory  I  to  System  Theory 

II,  as  with  the  change  from  Cybernetics  I  to  Cybernetics  II 
and  the  increased  scope  of  the  Theory  of  Evolution,  we  can 
see  a  theory  of  open,  non-linear,  complex,  dynamic,  self- 
organizing  (in  short:  evolutionary)  systems  approaching.  This 
theory  no  longer  deals  merely  with  mechanisms,  strategies 
and  controls  for  achieving/maintaining  homeostasis  and  the 
development  of  species;  it  also  concerns  the  birth,  growth  and 


decline  (i.e.  development)  of  all  systems,  from  the  formation 
of  the  earliest  known  particle,  through  the  arrival  of  terrestrial 
life  forms,  to  the  shaping  of  specific  human  socio-technical 
systems  (see  e.g.  [2],  [3],  [4]) . 

As  this  theory  of  self-organization  will  conceive  both  system 
and  evolutionary  aspects,  that  is  both  structural  and  process 
aspects,  I  want  to  argue  here  that  it  offers  the  possibility  of 
serving  as  the  appropriate  background  theory  for  unifying 
information  concepts  and  thus  for  understanding  and  designing 
intelligent  and  reasonable  information  processing  systems  [5]. 

In  order  to  interlink  informational  and  evolutionary-systems 
concepts  they  have  to  undergo  some  refashioning  which  will 
be  described  below  (for  preliminary  work  we  have  already 
done  see  [6],  [7],  [8]).  Firstly,  three  levels  of  a  system  shall 
be  distinguished.  Secondly,  the  well-known  semiotic  aspects 
will  have  to  be  rearranged.  Each  of  these  will  then  be  related 
to  a  certain  system  level.  The  next  step  is  to  point  out  that 
this  fundamental  structure  of  any  information  process  shows 
various  kinds  of  manifestations  according  to  the  evolutionary 
level  the  system  belongs  to.  Three  basic  stages  shall  be 
identified.  Finally,  a  stage  model  describes  the  symmetry 
breaking  sign  process. 

2.  The  Merger  of  the  Concepts  of  Systems 
and  Semiosis 

2.1.  Identifying  Three  System  Levels 

In  identifying  a  system,  three  different  levels  of  the  system's 
dynamics  can  be  distinguished: 

1.  The  level  on  which  the  elements  of  the  system  in  question 
are  interconnected.  This  is  the  level  of  the  internal  structure  of 
the  system  (micro-level). 
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2.  The  level  on  which  the  system  itself  is  in  one  state  or  in 
another  (meso-level). 

3.  The  level  on  which  the  system  exhibits  its  external 
behavior  vis-a-vis  its  environment.  The  way  the  system 
interacts  with  its  co-systems  in  the  net  is  examined  here 
(macro-level). 

Each  level  is  the  base  on  which  the  next  level  is  built;  each 
level  has  progressively  less  space  for  possibilities  than  the 
one  below.  From  one  level  to  the  next  there  is  a  leap  of 
qualitative  difference.  This  leap  can  (but  need  not)  be  bridged 
by  a  self-organization  cycle.  Thus  these  levels  give  the 
prerequisites  for  understanding  the  emergence  of  new  system 
structures,  system  states,  or  system  behavior,  or  even  the 
emergence  of  new  systems. 


2.2.  Refashioning  Semiotic  Dimensions 

According  to  semiotics,  information  can  be  conceived  as 
something  having  syntactical,  semantical  and  pragmatic 
aspects.  It  seems  sensible  to  interrelate  these  dimensions  not 
in  the  way  semiotics  did  sometimes,  but  rather  as  a  nested 
hierarchy  such  that: 

1.  the  innermost  dimension  is  the  syntactical  one; 

2.  the  intermediate  dimension  is  the  semantical  one; 

3.  the  outermost  dimension  is  the  pragmatical  one. 

This  means  that  they  are  not  merely  placed  side  by  side, 
without  having  anything  to  do  with  each  other;  each  initial 
aspect  is  essential  for  the  following  one,  and  each  following 
aspect  is  a  sufficient  condition  for  the  preceding  one. 
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Figure  1.  System  Levels  and  Semiotic  Dimensions  in 
Information-Processing  Systems. 


The  three  semiotic  aspects  thus  relate  to  one  another  in  the 
same  way  that  the  system  levels  relate  to  each  other. 

Together  they  mark  the  varying  qualities  of  information.  That 
is  to  say,  in  any  particular  system,  information  occurs  when 
as  a  result  of  a  self-organization  process  there  is  a  qualitative 
change  on  any  one  of  the  three  levels.  The  underlying 
information  process  may  result  in  structural  changes  to  the 
interior  of  a  system  (on  the  micro-level),  it  may  result  in 
changes  to  the  actual  state  of  the  system  (on  the  meso-level), 
or  it  may  result  in  altered  external  behavior  (on  the  macro- 
level).  We  must  note  that  changes  in  the  interior  structure 
need  not  lead  to  changes  in  the  system's  state,  and  that 
changed  states  need  not  necessarily  entail  changes  in  the 
behavior.  But  a  difference  in  the  output  of  a  system  must  be 
based  upon  a  different  state,  and  a  different  state  must  take  as 
a  basis  elements  and  relations  that  differ  from  previous 
structures. 


In  this  way  they  form  an  architecture  which  allows  for 
emergent  properties  [9]. 

2.3.  Assigning  the  Semiotic  Dimensions  to  the 
System  Levels 

Having  done  this,  the  system  and  semiotic  aspects  can  easily 
be  linked  to  one  another.  Each  of  the  three  semiotic  aspects 
can  be  related  to  an  appropriate  system  level,  as  follows  (see 
Fig.  1): 

1 .  syntactics  refers  to  the  micro-level; 

2.  semantics  refers  to  the  meso-level; 

3.  pragmatics  refers  to  the  macro-level  of  a  system. 


3.  Three  Stages  in  the  Evolution  of 
Information-processing  Systems 

3.1.  Reflecting  as  the  Most  Primitive  Form  of 
Semiosis 

If  open  systems  are  exposed  to  fields  in  which  the  gradient  of 
the  free  energy  density  distribution  exceeds  a  certain  amount, 
they  exhibit  the  capability  of  organzing  themselves  in  that 
they  build  up  order,  that  is  make  use  of  energy  for  performing 
work,  depreciate  it  and  remove  the  resulting  entropy  (so-called 
dissipative  systems).  Pattern  formation  is  thus  the  most 
rudimentary  form  not  only  of  self-organization,  but  also  of 
sign  processes  (see  Fig.  2).  The  system  selects  a  particular 
option  from  the  various  structuring  possibilities  it  has.  The 
selection  ends  up  in  a  new  structure.  The  information  process 
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is  the  process  of  restructuring  and  its  result  is  a  structure 
which  differs  from  the  structure  the  system  had  previously. 
There  is  a  relation  between  the  different  structures  which  can 
be  looked  upon  as  the  most  primitive  appearance  of  the 
syntactical  dimension  of  information.  The  way  the  system 
restructures  itself  is  the  self-determined  way  it  reflects  outer 
conditions  in  its  environment  (cf.  [10],  [11]). 


Figure  2.  Pattern  Formation  in  Primitive  Self-organizing 
Systems. 


3.2.  Building  Representations  as  a  Higher  Form  of 
Semiosis 

Reflection  is  the  precondition  for  learning  to  behave  in  such  a 
way  that  this  makes  the  environment  produce  conditions 
beneficial  to  the  maintenance  and  improvement  of  the  system. 
Simple  dissipative  systems  are,  however,  not  in  a  position  to 
perpetuate  the  flow  of  energy. 

Biotic  systems  are  a  special  category  of  dissipative  systems 
which  are  able  to  perform  in  this  way.  They  exhibit  division 
into  a  sensorium  and  an  effectorium,  which  involves  two 
cycles  of  self-organization,  one  on  the  top  of  the  other. 
Structural  change  splits  into  structural  and  state/behavior 
change.  The  self-organized  structure,  that  emerges  within  the 
system,  obtains  a  new  function:  it  becomes  a  symbol  which 
has  a  meaning,  namely  that  the  change  in  the  outer  world  of 
the  living  system  it  represents  is  behaviorally  relevant  to  the 
survival  of  the  System  (see  [12];  see  Fig.  3). 
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Figure  3.  The  Differentiation  of  Meanings  and  Symbols  in 
Self-organizing  Systems. 


The  syntactical  dimension  of  the  difference  between  the  old 
and  the  new  structures  is  supplemented  by  the  fact  that  this 
difference  makes  a  difference  regarding  the  goal  of  maintaining 
the  system.  So  a  semantical-pragmatical  dimension  is  added. 

Thus  representation  is  another  step  in  the  evolution  of 
semiosis.  Due  to  the  presence  of  this  new  informational 
relation  biotic  systems  show  better  adaptability  to  their 
environment.  They  are  in  a  position  to  take  advantage  of  the 
environment  to  such  an  extent  that  they  can  reproduce 
themselves.  This  behavior  is  referred  to  as  intelligent 
behavior  (see  e.g.  [13]). 


3.3.  Decision-making  as  Most  Advanced  Form  of 
Semiosis 

Certainly  social  systems  which  involve  human  cognition  and 
communication  are  the  most  highly  developed  appearance  of 
information  processing  which  is  known  to  us  at  present. 

They  exhibit  even  greater  adaptability  than  mere  biotic 
systems:  they  alter  their  environment  to  suit  themselves. 
That  is  to  say,  their  sphere  of  influence  is  characterized  by  a 
feedback  loop,  through  which  the  systems  can  create  the 
conditions  necessary  not  only  for  their  reproduction,  but  also 
for  creating  themselves  according  to  the  goals  they  freely 
choose.  The  behavioral  decisions  are  no  longer 
indistinguishable  from  the  representations,  but  now  are 
conveyed  with  the  knowledge  via  another  phase  transition  as 
knowledge  is  mediated  with  data  (see  Fig.  4).  The  process  of 
information  includes  the  activies  of  perceiving,  interpreting, 
and  evaluating,  and  shows  separated  dimensions  of  semantics 
and  pragmatics.  Knowledge  makes  a  difference  as  to  wisdom, 
that  is  reasoned  instruction  of  acting  which  then  deserves  to 
be  called  reasonable  behavior  of  a  system. 


— "Wisdom 


[  knowledge  i 


Figure  4.  The  Differentiation  of  Wisdom,  Knowledge  and  Data 
in  Advanced  Self-organizing  Systems. 
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4.  A  Symmetry-breaking  Evolution  of 
Semiosis 

Summing  up,  a  stage  model  can  be  postulated  in  which  a 
process  of  symmetry  breaking  unfolds  from  a  stage  which  is 
characterized  by  simple  reflective,  self-organizing  systems  via 
a  stage  of  systems  which  produce  representations  toward  a 
stage  on  which  decision-making  systems  produce  perceptions, 
interpretations  and  evaluations  (see  Fig.  5). 


real  world 


mechanical  semiotic 


semantical  pragmatical 

Figure  5.  The  Unfolding  of  Semiotic  Aspects  in  the  Course  of 
the  Evolution  of  Self-organizing  Systems. 

Thus  a  stage  model  of  information  merges  the  system  aspects 
and  the  evolutionary  aspects  of  semiosis. 

In  each  phase  of  evolution  a  new  moment  appears  which 
becomes  characteristic  of  a  layer  for  more  highly  developed 
systems  and  gives  the  entire  layering  of  this  system  its 
nature. 

This  model  may  serve  methodologically  as  a  framework  for 
theorizing  information  processes  in  all  kinds  of  real-world 
systems  and  foPthe  elaboration  of  a  UTI.  It  may  assist  as 
well  the  design  of  artificial  systems  which  aid  human 
cognition  and  communication  processes. 
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Abstract 

This  paper  investigates  the  applicability  of  the 
biological  concept  of  aging  onto  the  performance  of  a 
Genetic  Algorithm.  It  is  explained  how  the  natural 
phenomenon  of  decline  in  vitality  due  to  old  age  is 
translated  into  the  domain  of  Genetic  Algorithms, 
and  a  GA  implementation  is  tested  on  four  partly  NP- 
hard  problems.  The  basic  rationale  is  that  aging 
factors,  depending  on  the  problem  they  are  applied 
to,  can  greatly  influence  the  quality  of  the  results 
produced  by  the  GA,  and  thus  speed  up  or  slow  down 
the  search  process.  The  results  organized  by  test 
problem  are  presented,  and  compared  quantitatively. 
Rules  of  thumb  are  laid  out  as  to  how  influential  the 
aging  component  of  the  fitness  function  should  be 
chosen  in  order  to  achieve  the  best  search  results  for 
the  four  problems  treated  in  this  research. 

1.  Introduction 

In  1803,  T.R.  Malthus  [7]  stated  his  observation  that 
a  population  in  which  the  individuals  reproduce  will 
grow  geometrically  and  thereby  deplete  its  natural 
resources,  unless  it  is  regulated  by  the  mortality  of  its 
individuals.  Also,  species  of  immortal  individuals 
would  stop  evolving,  because  the  process  would  stall, 
once  all  ecological  niches  fit  for  life  had  been 
exploited.  By  putting  a  cap  on  the  number  of 
individuals  under  competition  and  introducing  new, 
variant  individuals,  death  only  enables  the  dynamic 
process  of  evolution.  In  other  words,  aging  and 
eventual  death  is  a  natural  implementation  to  give 
someone  else  a  chance. 

Throughout  most  of  the  natural  species  that 
reproduce  sexually,  it  is  a  common  fact  that  old 
individuals  lose  vigor  and  are  generally  less  likely 
than  young  ones  to  survive  in  "struggle  for 
existence"[4].  Also,  natural  individuals  die  of  age  by 
which  they  completely  retreat  from  reproduction. 
Researchers  in  evolutionary  biology  state  several 
different  explanations  for  this  phenomenon.  Group 
selectionists  hold  the  point  of  view  that  dying  is  an 
act  of  altruism  for  the  rest  of  the  species.  They  place 
the  selection  process  on  the  level  of  a  race,  or  even 
the  entire  species,  which  strives  for  perfection  [14]. 
Old  creatures  would  use  up  valuable  resources  for 
younger  and  fitter  individuals,  and  therefore  slow 


down  the  species'  pace  on  its  way  to  evolutionary 
perfection.  Such  a  breed  would  not  be  fit  for 
competition  with  other  species,  so  it  would  die  in  the 
long  run.  Gene  selectionists  on  the  other  hand  think 
that  there  are  lethal  genes  in  every  individual. 
Selection  simply  favors  lethal  genes,  which  are  either 
only  expressed  at  old  age,  or  at  least  are  suppressed 
until  after  reproduction.  According  to  these  theories, 
aging  actually  provides  species  with  an  advantage  in 
the  process  of  selection  [3].  Some  biologists  (e.g. 
[10])  have  gathered  evidence  that  aging  and  death 
might  be  consequences  of  deleterious  genes. 

Irrespective  of  the  viewpoint,  aging  is  a  major  factor 
in  the  system  of  natural  evolution.  Considering  its 
influence,  it  has  gone  remarkably  unnoticed  or  has 
been  passed  by  in  the  research  of  artificial  evolution, 
though  this  concept  can  easily  be  translated  into 
genetic  algorithms.  Holland's  original  and  most 
commonly  used  concept  of  Genetic  Algorithms  [15] 
can  be  viewed  as  strictly  generational.  The  life  span 
of  all  individuals  is  simply  one  generation.  After  that 
they  die  and  are  replaced  by  offspring  produced  from 
the  previous  generation.  This  very  drastic  change 
introduces  the  necessary  variation  into  the 
population,  but  it  does  not  pay  tribute  to  biological 
facts.  It  is  important  to  understand  that  death  does  not 
come  all  of  a  sudden.  It  is  rather  the  result  of  the 
dynamic  process  of  first  maturing,  and  then  aging. 
This  detail  has  an  important  impact  onto  the 
evolution  of  a  species,  because  of  the  observation 
that  old  individuals  are  less  likely  to  survive  and  take 
part  in  another  mating  phase.  Evolutionary  strategies 
[2]  and  steady-state  Genetic  Algorithms  [9]  have  a 
different  approach.  Individuals  are  evaluated  at  the 
end  of  a  generation,  and  an  artificial  fitness  function 
determines  an  individual's  likelihood  to  make  the 
transition  into  the  next  generation.  The 
implementation  of  the  Reference  Genetic  Algorithm 
also  follows  this  stochastic  approach.  Still,  these 
approaches  make  death  come  to  an  individual  all  too 
sudden.  They  do  not  consider  that  death  is  just  the 
outcome  of  the  inherently  gradual  process  of  aging. 
Age  does  not  play  a  role  at  all  in  an  individual's 
probability  to  be  selected  into  the  next  generation, 
nor  does  it  influence  the  reproduction  process. 
Nature  does  not  work  this  way.  This  is  why 
researcher  in  the  field  of  artificial  life,  who  are 
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interested  in  simulating  the  emergence  of  life  with 
the  help  of  computers  came  up  with  different  ideas  to 
simulate  the  natural  processes  of  aging  and  dying.  In 
Thomas  Ray's  Tierra  system  [12]  where  self- 
replicating  programs  consisting  of  assembly  language 
instructions  for  a  virtual  machine  run  competing  for 
computer  memory  and  CPU  time.  Although  in  Tierra 
senescence  does  occur,  the  ideas  of  aging  and  death 
are  based  on  the  individual's  success  in  coping  with 
the  system's  requirements  during  its  lifetime.  As  a 
consequence,  individuals  complying  perfectly  with 
Tierra' s  rules  will  live  forever,  unless  more  than  80 
percent  of  the  environment  is  filled  up  with  infallible 
programs.  This  can  also  lead  to  stalling  in  the 
evolutionary  process.  This  phenomenon  of  stalling  is 
the  reason  why  Peter  Todd  [13]  in  his  system  to 
study  the  evolution  of  behavior  in  a  simulated 
environment  introduced  a  new  kind  of  artificial 
death.  An  individual  perishes  if  it  runs  out  of  energy, 
so  the  only  notion  of  aging  in  his  original  world  is 
running  out  of  energy.  Still,  Todd's  examination  of 
this  system  exhibited  the  evolution  of  immortal 
creatures,  whose  main  feature  was  that  they  chose  not 
to  take  part  in  the  very  costly  process  of  producing 
offspring.  The  conclusion  in  [13]  recommends  that 
new  forms  of  death  are  needed  to  prevent  immortals 
from  bringing  the  evolutionary  process  to  a  stop. 

Research  in  the  field  of  Genetic  Algorithms  has 
vastly  ignored  the  notion  of  age  inherent  in  biological 
systems.  In  traditional  Genetic  Algorithms  where  in 
each  new  generation  the  entire  population  is  replaced 
by  their  offspring  individuals  can  die  too  quickly  to 
spread  good  genes.  In  purely  stochastic  approaches 
on  the  other  hand,  immortal  individuals  can  slow 
down  the  evolutionary  search  process,  or  make  it  get 
stuck  in  local  plateaus  of  the  search  space.  The 
reason  for  this  is  that  old  individuals  use  up  the 
resources  for  young  ones.  If  they  do  not  take  part  in 
the  mating  process  a  sufficient  number  of  times  then 
they  do  not  introduce  any  variation  into  the 
population;  they  just  sit  in  the  population  being  an 
obstacle  to  evolution.  The  population  does  not  profit 
from  good  genes  they  might  have.  Realizing  this 
deficiency  and  inspired  by  the  attempts  made  in 
Artificial  Life  to  overcome  this  very  unnatural  way 
of  generational  transition,  the  authors  examined  the 
influence  of  a  specific  type  of  aging  on  process  and 
results  of  genetic  search. 

The  remainder  of  this  paper  addresses  this  topic. 
Section  2  below  describes  a  basic  Genetic  Algorithm 
(i.e.  GA)  that  is  used  in  this  work  to  compare  its 
results  to  those  found  by  the  GA  that  implements  the 
aging  concept.  This  algorithm  is  described  in  section 
3.  The  following  four  sections  describe  and  compare 
the  application  of  both  algorithms  to  a  variety  of 


problems.  First  among  these  is  the  0/1  knapsack 
problem  (KSP)  [8]  in  section  4.  It  is  followed  in 
section  5  by  the  problem  to  optimize  the  amount  of 
resources  needed  to  survey  an  area  of  special  interest 
(ASP).  In  section  6  we  apply  these  algorithms  to  the 
iterated  version  of  the  Prisoners'  Dilemma  Problem 
(PDP)  [1].  Section  7  is  dedicated  to  the  examination 
of  a  timetable  scheduling  problem  (SCP).  Finally,  in 
section  8  we  draw  some  conclusions  about  the 
general  applicability  of  the  aging  concept  in  the  field 
of  evolutionary  search. 

2.  The  Reference  GA 

The    basic,    traditional    GA    as    presented  here 
implements  a  genetic  algorithm  as  described  in 
[Michalewiczl996].  In  the  following  this  GA  will  be 
referred  to  as  the  Reference  GA.  It  incorporates  three 
basic  concepts  of  the  evolutionary  search  procedure. 
Selection    process:     Individuals    that  represent 
solutions  to  the  problem  are  evaluated  according 
to  a  fitness  function  that  decides  about  the 
quality  of  a  solution.  A  solution's  fitness  is 
proportional  to  its  chances  to  survive. 
/"   Genetic   Operator   —    Crossover:    A  certain 
percentage  of  the  individuals  in  the  population 
are  randomly  chosen  to  mate  and  produce 
offspring.  The  children's  genetic  code  is  a 
recombination  of  the  parents'  genome.  This 
introduces  variation  the  necessary  to  make  the 
evolutionary  search  process  work. 
2nd  Genetic  Operator  —  Mutation:   The  second 
genetic  operator  randomly  picks  units  of  genetic 
information  from  the  gene  pool  and  modifies 
them.  By  this  procedure  the  altered  gene  can 
adopt  a  state  that  is  not  yet  present  in  the 
population,  which  again  increases  the  level  of 
variety  among  the  solutions. 

These  three  concepts  are  incorporated  into  the 
repetitive  process  which  is  described  in  the 
following.  Initially,  half  the  population  of  bit  strings 
is  created  randomly.  To  increase  variation,  the  other 
half  starts  out  to  be  the  complement  of  the  bit 
representation  of  the  first  half.  Now  the  first 
activation  of  the  selection  process  determines  those 
individual  solutions  that  survive  to  form  the  second 
generation.  The  main  idea  behind  the  distinction  of 
those  individuals  surviving  the  generational  transition 
is  the  probability  distribution  based  on  the 
individuals'  fitness  values.  In  other  words,  the 
probability  for  an  individual  to  be  selected  to 
participate  in  the  next  generation  is  proportional  to  its 
fitness  as  defined  by  the  problem  to  solve.  A 
metaphor  Michalewicz  uses  for  this  process  is  that  of 
spinning  a  roulette  wheel,  where  each  individual  has 
a  reserved  slot.  The  width  of  that  slot  is  directly 
proportional  to  its  fitness,  and  the  circumference  of 
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the  entire  roulette  wheel  equals  the  sum  of  all  slots 
for  the  individuals  in  the  population. 


Figure  1:  The  selection  roulette.  Each  slot 
delimited  by  dashed  lines  represents  the  fitness 
value  for  one  individual  solution.  The  different 
widths  of  the  slots  stand  for  the  fact  that  there  are 
fitter  and  less  fit  individuals  in  the  population. 

Maintaining  a  fixed  number  of  solutions  in  the 
population  is  achieved  by  spinning  the  roulette  wheel 
with  one  slot  for  each  individual  exactly  as  many 
times  as  there  are  individuals  in  the  population.  Since 
the  slot  size  for  an  individual  is  sized  according  to  the 
individual's  fitness,  the  roulette  ball  is  more  likely  to 
fall  into  the  slot  of  a  fitter  individual.  Note  that  it  is 
possible  (and  even  very  likely  in  the  case  of  a 
population  which  is  composed  of  a  few  very  fit 
solutions  and  many  unfit  solutions)  that  some  of  the 
individuals  get  chosen  several  times  to  take  part  in 
the  next  generation.  The  crossover  operator  simulates 
sexual  reproduction.  It  is  applied  to  the  population, 
which  results  from  the  selection  process.  Initially, 
each  individual  gets  assigned  a  random  number 
between  0  and  1.  All  individuals  whose  number  is 
smaller  than  a  user-defined  parameter  for  the  GA, 
which  determines  the  probability  for  a  single 
individual  to  undergo  crossover,  will  take  part  in  the 
mating  process.  Here,  solutions  gather  in  pairs  to 
perform  the  recombination  of  their  genome.  The 
method  applied  here  is  single-point  crossover,  which 
means  that  both  parents  switch  their  genetic  code 
after  a  randomly  chosen  position  in  their  bit  string. 
Example:  Two  parents  p,  and  pj  are  mated.  Their 
genome  strings  are 

P  ={ava2>...'aPos'aPos^...>a)  > 

Pj=(t>1>K>...'bPos,bpos+i...>bN  ) 

After  crossover  at  position  pos  the  resulting 
individuals  are 


ft  =(ai'a2-...'a^rbpoe,1,...,bt), 

Pi  =  (b1,b2,...,bpot,apos^    ,aN  ) 

The  children  resulting  from  crossover  will  replace 
their  parents  in  the  population.  The  population 
effected  by  crossover  is  then  rendered  by  mutation. 
Here  another  parameter  determines  the  likelihood  for 
one  bit  of  genetic  information  in  the  gene  pool  to  be 
flipped.  This  is  done  by  randomly  assigning  a 
number  to  each  bit  in  every  individual,  and  then 
inverting  its  value  if  the  random  number  for  this  bit  is 
smaller  than  the  mutation  rate  parameter. 

One  such  sequence  of  selection,  crossover,  and 
mutation  is  called  a  generation.  A  limit  is  placed  on 
the  number  of  generations  that  the  GA  can  try  to 
evolve  a  good  solution  to  the  problem  it  is  applied  to. 
If  criteria  are  know  about  the  fitness  of  an  optimal 
solution,  then  the  evolutionary  process  is  usually 
stopped  as  soon  as  such  an  individual  is  found. 

3.  Implementation  of  Biological  Aging 

In  order  to  test  the  effect  of  aging,  the  Reference  GA 
described  in  the  previous  section  has  to  be  adapted 
mainly  in  two  ways.  First  of  all,  the  representation  of 
an  individual  needs  to  be  extended  to  include 
information  about  its  age  (i.e.,  the  number  of  times  it 
has  survived  the  selection  process).  Secondly,  the 
term  fitness  has  to  be  re-defined  to  encompass  the 
idea  to  penalize  old  individuals  that  have  not 
procreated.  To  serve  the  first  purpose,  each 
individual  solution  is  represented  as  a  pair.  Its  first 
element  is  the  list  of  bits  encoding  the  solution's 
genome.  The  second  element  contains  the  number  of 
times  this  particular  individual  has  been  chosen  to 
take  part  in  the  next  generation. 

Example:  The  dotted  pair 

( (0  1  1  1  1  0  0  1).  15  ) 
represents  a  creature  with  the  genome  string  (0111 
10  0  1)  which  has  survived  15  generations. 

Individuals  created  during  the  population 
initialization  before  the  first  generation,  as  well  as  the 
child  solutions,  which  result  from  crossover,  will 
start  out  having  a  zero  in  their  age  component.  If  one 
or  more  of  an  individual's  genes  was  subject  to 
mutation,  then  the  individual's  age  component  will 
also  be  reset  to  zero.  This  rejuvenation  is  done 
because  a  change  of  one  gene  only  can  often  trigger 
an  important  allele,  which  is  not  yet  part  of  the 
population's  genetic  pool.  An  old  individual  might 
become  too  valuable  to  justify  penalizing  it  for  its  old 
age. 
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Apart  from  the  adaptation  of  the  representation  the 
selection  process  has  to  be  changed  to  take  the  age  of 
a  solution  into  account.  Aging  is  supposed  to  have  a 
negative  influence  on  the  creature's  fitness.  Still,  age 
does  not  play  a  role  in  the  problem-specific,  user- 
defined  fitness  function.  The  negative  influence  of 
age  should  happen  in  a  way  that  is  transparent  for 
this  external  fitness.  On  the  other  hand,  the  impact  of 
aging  onto  the  process  should  also  be  quantifiable  for 
the  user.  To  ensure  all  these  constraints,  the  influence 
of  aging  onto  the  fitness  of  an  individual  in  the 
selection  process  is  exerted  in  a  special,  internal 
fitness  function.  Assigning  higher  or  lower  values  to 
the  parameter  AGE_FACTOR  can  regulate  the  actual 
importance  of  the  aging, 
(define  (InternalFF  ind) 
(let  ((fitness 

(ExternalFF  ind) ) ) 
(if   (  =  0  fitness) 
0 

(max  0 

(+  fitness 

(*  (ExternalFF 
BEST_EVER) 
AGE_F AC  TOR 
(/  1 

(1+   (cdr  ind) ) ) 

)...) 

The  internal  fitness  of  a  solution  is  a  function  of  its 
external  fitness  which  is  increased  by  a  portion  of  the 
external  fitness  value  for  the  best  solution  found  by 
the  GA  up  to  the  time  of  the  call.  The  degree  of  this 
increase  is  inversely  proportional  to  an  individual's 
age.  However,  an  external  fitness  value  of  zero 
automatically  effects  the  same  internal  fitness.  The 
selection  procedure,  which  decides  about  the 
survivors  of  a  generation,  is  only  based  on  the 
internal  fitness.  The  Reference  GA  with  the  changes 
laid  out  in  this  section  will  be  called  Aging  GA  in  the 
rest  of  this  article. 

4.  The  0/1  Knapsack  Problem 

The  first  problem  for  which  the  differences  between 
the  Reference  GA  and  the  Aging  GA  are  examined  is 
the  well-studied  0/1  -KSP.  It  belongs  to  the  class  of 
NP-hard  problems  [Garey  &  Johnson  1979],  so  it  is 
reasonable  to  try  and  approach  it  with  heuristics  like 
a  Genetic  Algorithm.  Imagine  a  traveler  trying  to  fill 
his  backpack.  The  task  complicated  by  the  fact  that 
there  are  more  objects  to  pack  into  the  knapsack  than 
it  can  carry.  He  has  to  decide  which  objects  to  take 
with  him  on  the  trip,  and  which  objects  to  leave  at 
home.  In  other  words,  he  has  to  find  the  best  packing 
for  his  rucksack,  so  that  it  does  not  break  but  that  he 
still  can  transport  the  items  that  are  most  valuable  to 
him.  In  the  example  instantiation  of  the  KSP  that 
both  Genetic  Algorithms  try  to  solve  there  are  n=20 
Objects,  and  the  capacity  of  the  knapsack  is  C=54. 
The  lists  W  and  V  store  the  weights  and  values  of  the 


objects  that  can  be  packed.  One  possible  optimal 
packing  with  a  value  of  214  and  a  weight  total  of  54 
would  be  to  chose  objects  2,  3,  4,  7,  8,  10,  12,  13,  14, 
17,  18,  and  20.  Table  1  lists  the  objects  and  their 
weights. 

(define  W  '(1122446  6  10  10 
1  1  2  2  4  4  6     6     10  10)  ) 

(define  V  ' (3  6  5  8  3  7  18  30  2  40 
365837   18  30  2  40)) 

The  individual  solutions  (i.e.  the  knapsack  packings) 
are  represented  by  lists  in  which  each  element 
represents  one  component  of  the  vector  X .  A  1  in  the 

component  of  vector  x  therefore  indicates  that 

object  number  i  is  part  of  the  particular  packing, 
whereas  a  0  means  that  it  is  not  in  the  knapsack. 

Example:  The  representation  of  a  solution,  that 

suggests  packing  objects  4,  6, 

8-11,16,  and  19,  would  be  the  following  list: 

(0  001010111100001001  0) 
By  looking  up  each  activated  objects  weight  and 
value,  one  can  determine  that  the  solution's  weight 
sum  is  47,  and  its  values  add  up  to  99. 

The  genetic  algorithm  creates  an  initial  population  of 
forty  random  solutions  like  the  one  in  the  example 
above.  Here,  as  well  as  through  the  entire 
evolutionary  process  the  GA  can  generate  solutions 
whose  weight  sum  of  the  packed  objects  exceeds  the 
capacity  of  the  knapsack.  In  the  following  these 
solutions  are  referred  to  as  invalid  (or  infeasible) 
because  they  violate  the  constraint  imposed  onto  a 
solution  to  be  considered  feasible.  The  decision  was 
made  to  lift  the  weight  constraint  from  the  solutions, 
and  thereby  transform  the  constrained  KSP  into  an 
unconstrained  problem.  Still  In  order  to  favor  valid 
packings  invalid  solutions  are  penalized  in  proportion 
to  their  degree  of  invalidity.  This  was  achieved  by 
decreasing  the  individual's  fitness  according  to  its 
excess  weight.  The  penalty  calculated  by  the  function 


Pen(x)=Cl    £x[)]  W[T|-C 

V  (=1  /  I* 

The  value  of  this  function  increases  with  the 
individuals  excessive  weight.  The  entire  fitness 
function  can  be  stated  as 

Fitness  (x )  =  f  £  V  [i]  £  W[i )}-  £  (V  [i  ]  -  x[i  ]  V[i  ])-  Pen  (x ) 

V  .=  1  .  =  1  /       i  =  J 

Both  GAs  were  extensively  tested  for  this  problem. 
840  runs  of  200  generations  each  were  conducted  to 
determine  the  average  and  mean  values  for  both 
GAs'  weight  and  value  sum.  The  results  are  shown  in 
Table  2. 
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Table  2:  Results  for  the  KSP 


|  Reference  GA 

Aging  GA  9.06 

W  Average 
..._:.„.f.™_"ei.   

52.075 

52.274 

W  Mean 

52.119 

52.226 

Std.  Dev. 

0.337 

0.278 

Optimality 

96.4% 

96.8% 

V  Average 

188.979 

190.261 

V  Mean 

189.036 

190.311 

Std.  Dev. 

2.240 

1.456 

Optimality 

87.9% 

88.5% 

The  level  of  optimality  in  this  table  compares  the 
average  values  of  both  GAs  to  the  perfect  solution, 
which  uses  the  full  capacity  of  the  backpack  to  take 
item  at  a  value  total  of  215  unit.  Interpreting  the 
results  one  can  say  that  both  the  average  weight  as 
well  as  the  average  value  sum  of  the  packings 
produced  by  the  Aging  GA  are  closer  to  the  optimal 
value.  Even  if  the  improvement  is  within  the  range  of 
the  standard  deviation  around  the  average  of  the 
Reference  GA  values,  the  results  clearly  show  that 
aging  can  have  a  positive  effect  on  the  results  of  the 
KSP.  The  impact  of  aging  is  very  sensible  to  changes 
of  the  AGE_FACTOR  parameter  for  the  internal 
fitness  function.  Although  the  value  9.06  was  the  best 
that  the  authors  found  during  their  test  phase,  an 
exhaustive  search  for  the  best  parameter  value  was 
not  attempted. 

5.  The  Area  Surveillance  Problem 

The  setting  for  second  problem  is  a  2-dimensional 
plane,  in  which  parts  are  of  special  interest.  These 
parts  are  later  referred  to  as  the  surveillance  area.  A 
surveillance  area  consists  of  a  set  of  different 
geometric  objects.  The  types  of  these  objects  are 
circles,  half  planes,  annuli  (i.e.  the  intersection  of  one 
circle  with  the  complement  of  another  concentric 
circle),  convex  polygons,  pies,  and  area  segments.  In 
addition  to  the  surveillance  area,  there  are  cover 
objects.  Their  shapes  can  be  chosen  from  the  same 
range  of  types  as  those  of  the  surveillance  area. 
Cover  objects  are  meant  to  cover  a  maximal  part  of 
the  surveillance  area.  All  objects  (surveillance  area  as 
well  as  cover  objects)  have  a  permanent  location. 
Once  assigned  a  position  in  the  plane  during  the 
problem  description  phase,  they  do  not  change  it  in 
the  course  of  the  problem  solution.  Another  feature 
of  the  cover  objects  is  that  operating  them  bears 
certain  costs.  These  costs  are  mainly  influenced  by 
the  amount  of  surveillance  area  they  cover,  and  by 
their  position  in  the  plane.  Cover  objects  can  be  in 
one  of  two  states:  either  they  are  engaged  (switched 
on),  or  they  are  disengaged  (switched  off).  Each 
cover  object  harbors  a  distinct  part  of  the  surveillance 
area  once  it  is  engaged.  Cover  objects  can  intersect. 
All  objects  (surveillance  area  as  well  as  cover 
objects)  are  represented  as  a  set  of  points  in  the  plane 


contained  by  the  object.  The  number  of  points  in  an 
object  is  proportional  to  the  resolution  used  to 
describe  the  plane. 

The  goal  is  to  maximize  the  percentage  of 
surveillance  area  covered  by  cover  objects,  and  at  the 
same  time  to  minimize  the  costs  resulting  from 
switching  on  a  cover  object.  Depending  on  the 
specific  situation,  a  trade-off  has  to  be  found  between 
those  two  goals.  The  exact  preferences  of  this  trade- 
off are  defined  by  the  user.  The  ASP  problem  can 
easily  be  reduced  to  the  vertex  cover  problem[6]. 
This  shows  that  it  is  also  NP-hard. 

In  the  ASP  considered  here,  a  set  of  15  cover  objects 
try  to  cover  an  area  of  2  polygons  and  2  annuli.  Table 
3  shows  the  objects  and  their  activation  costs,  while 
Figure  2  and  3  sketch  the  surveillance  area's  and  the 
cover  objects'  location  in  the  plane. 


Figure  2  :  The  area  to  be  surveyed 
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Table  3  :  Cover  objects  and  their  cost 
factors  for  the  ASP 


Object 

1 

2 

3 

4 

5 

6 

7 

Costs 

.50 

.45 

.40 

.15 

.35 

.32 

.10 

8 

9 

10 

11 

12 

13 

14 

15 

.20 

.25 

.22 

.22 

.28 

.18 

.18 

.15 

256 


Figure  3:  The  cover  objects 


A  solution  to  the  problem  of  surveying  an  area  is 
represented  as  a  list  of  15  bits.  A  0  in  position  m  of 
the  solution  means  that  the  cover  object  m  is 
disengaged,  whereas  a  1  would  mean  that  it  is 
switched  on. 

Example:  The  solution  list  (0  1001  10000000 
01  0)  would  mean  that  the  surveillance  area  is  only 
covered  by  the  cover  objects  number  2,  5,  6,  and  15. 
Such  a  solution  list  also  functions  as  one  individual 
in  the  Genetic  Algorithm. 

For  the  ASP  both  the  Reference  GA  and  the  Aging 
GA  made  400  runs  of  15  generations.  The  population 
size  was  set  to  60  solutions.  Table  4  presents  the 
results. 


Table  4:  Results  for  the  ASP 


1  Reference  GA 

AgingGA-l.(i2 

Average 

126.791 

127.972 

Mean 

126.547 

128.037 

Std.  Dev. 

1.0226 

1.873 

Optimality 

93.6% 

94.4% 

Very  much  like  before,  the  average  and  mean 
fitnesses  of  the  Aging  GA  reach  a  value  which  is 
closer  to  the  fitness  value  of  the  best  possible 
solution.  However,  both  GAs  had  enough  time  to 
evolve  close-to-optimal  solutions.  A  particularity  that 
can  be  observed  here  is  that  the  aging  factor  has  a 
negative  algebraic  sign.  This  can  be  interpreted  as 
support  for  older  individuals  in  the  selection  process. 
The  optimal  solution  that  both  GA  average  fitnesses 
were  compared  to  had  a  fitness  value  of  135.5. 

7.  The  Scheduling  Problem 


The  Scheduling  Problem  (i.e.  SCP)  is  also  a  member 
of  the  category  of  NP-hard  problems.  The  particular 
problem  addressed  herein  may  be  described  as 
follows.  For  a  small  school  with  four  teachers  and 
four  rooms  a  timetable  for  20  courses  has  to  be 
created  by  a  GA,  so  that  each  of  the  courses  is 
scheduled  in  one  of  8  possible  time  slots.  The 
timetable  produced  has  to  comply  with  certain 
constraints.  A  constraint  violation  leads  to  a 
penalization  for  the  schedule's  fitness.  The  smallest 
penalty  is  placed  on  assignments  which  violate 
constraints  between  courses.  This  happens  if  two 
courses  are  scheduled  in  the  same  time  slot  although 
they  should  not  be  held  in  parallel  (e.g.,  because 
students  ought  to  be  enrolled  in  both  classes  in  the 
same  semester).  A  medium  penalty  is  imposed  on 
professor  constraint  violations.  Such  a  penalty  is 
given  to  timetables  which  schedule  a  professor  to 
teach  two  courses  at  the  same  time.  The  hardest 
penalty  affects  timetables  that  schedule  two  different 
courses  at  the  same  time  and  in  the  same  room.  An 
individual  schedule  consists  of  20  genes  with  7  bit 
each.  The  first  2  bits  of  each  gene  represents  one  of 
the  4  rooms  in  which  the  courses  can  take  place.  The 
next  3  bits  determine  the  time  slot  (out  of  8  possible 
slots)  during  which  the  course  is  scheduled.  The  third 
element  assigns  one  of  the  4  professors  to  the  course. 
One  gene  is  also  referred  to  as  a  schedule  item. 

The  fitness  function  chosen  for  this  problem 
processes  three  matrixes  in  order  to  keep  track  of 
constraint  violations.  The  first  matrix  statically  stores 
all  conflicts  between  courses.  The  second  and  third 
matrix  are  dynamically  updated  during  the  process  of 
a  schedule  evaluation.  One  of  the  two  serves  to 
survey  the  courses  that  take  place  in  a  particular 
room  at  a  certain  timeslot.  The  other  one  stores  for 
each  possible  pair  of  teacher  and  timeslot  if  a 
professor  is  already  assigned  to  teach  a  course  at  a 
certain  time.  The  fitness  function  now  examines  each 
single  schedule  item  in  the  following  manner.  If  a 
room  has  already  been  assigned  to  another  course 
during  the  same  period  of  time,  then  the  fitness 
function  assigns  a  fitness  value  to  this  item  that  is 
only  3.3%  of  what  a  conflict-free  schedule  item 
would  have  scored.  Otherwise  the  function  tests  the 
schedule  item  for  course  conflicts  and  professor 
conflicts.  In  case  of  a  course  conflict,  the  schedule 
item  scores  10%  of  a  conflict-free  item,  while  a 
professor  conflict  results  in  5%  of  the  optimal  score 
per  item.  Table  5  shows  a  conflict-free  schedule  that 
is  in  keeping  with  the  course  conflict  matrix  that  was 
chosen  for  the  test  runs.  The  columns  represent  the 
four  rooms  Rl  through  R4,  while  the  rows  show  the 
eight  timeslots  SI  through  S8.  Each  entry  Cn/Pm  in 
cell  (Rj,Sk)  stands  for  the  scheduling  of  course  Cn  in 
room  Rj  at  slot  Sk,  and  assignment  of  professor  P m 
so  teach  the  course. 
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Table  5  :  A  perfect  schedule  without 
constraint  violations 


'    ■  Rl 

R2 

R3 

R4 

SI 

Cl/Pl 

C6/P2 

C11/P3 

C16/P4 

S2 

C2/P1 

C7/P2 

C12/P3 

C17/P4 

S3 

C3/P1 

C8/P2 

C13/P3 

C18/P4 

S4 

C4/P1 

C9/P2 

C14/P3 

C19/P4 

S5 

C5/P1 

C10/P2 

C20/P4 

S6 

C15/P3 

S7 

S8 

The  SCP  was  tested  in  840  runs  of  300  generations 
each.  The  number  of  schedules  in  the  population  was 
30.  Table  6  exhibits  greater  differences  than  notable 
for  the  previous  two  problems 


Table  6:  Results  for  the  SCP 


1  Reference  GA 

Aging  GA  -0.56 

Average 

152.524 

172.687 

Mean 

152.579 

172.558 

Std.  Dev. 

0.971 

1.1604 

Optimality 

76.3% 

86.3% 

The  average  result  of  the  Aging  GA  is  10%  closer 
than  the  Reference  GA  to  creating  a  schedule  that 
complies  with  all  constraints.  Here  as  before  it  is 
important  to  mention  that  this  drastic  improvement 
was  only  perceived  for  the  exact  value  of  -0.56  for 
the  influence  of  aging.  Other  positive  and  negative 
values  for  aging  exhibited  worse  results  which 
mostly  even  scored  below  the  Reference  GA's 
average  fitness. 

8.  The  Prisoners'  Dilemma  Problem 

The  last  problem  that  is  used  to  show  the  importance 
of  aging  in  the  field  of  Genetic  Algorithms  is  quite 
exceptional.  The  goal  here  is  not  so  much  to  solve  a 
computationally  expensive  problem  than  rather  to 
learn  more  about  the  emergence  of  cooperative 
behavior  among  selfish  individuals.  The  PDP  comes 
from  the  field  of  game  theory.  Initially,  the  PDP  was 
formulated  as  a  two-person  game  in  which  two 
players,  who  have  committed  a  crime  together,  were 
held  in  separate  cells  without  any  means  of 
communication.  Both  players  A  and  B  are  offered  the 
same  deal:  if  one  defects  and  betrays  the  other  one, 
whereas  if  the  other  player  does  not  testify,  then  the 
defectant  will  not  be  imprisoned  while  the 
cooperative  player  will  serve  a  5  year  sentence.  If 
both  players  defect  then  their  testimony  will  be 
discredited  and  they  will  both  be  put  away  for  4 
years.  However,  in  the  case  that  both  players  do  not 
testify  against  each  other,  they  will  only  serve  a  2- 


year  sentence.  A  game  consists  of  two  moves,  one  by 
each  player.  The  players  independently  decide  which 
move  to  make,  namely  either  to  defect  (D)  or  to 
cooperate  with  the  other  player  (C).  The  following 
table  describes  the  pay-off  function  for  each  player. 
The  number  in  the  pay-off  columns  indicates  by  how 
many  years  a  player  shortens  his  time  in  prison. 


Table  7:  Pay-off  in  years  saved  for  Player 
A  and  Player  B  in  the  PDP 


Player  A 

Player  B 

Pay-off  A 

Pay-off  B 

Defects 

Defects 

1 

1 

Defects 

Cooperates 

5 

0 

Cooperates 

Defects 

0 

5. 

Cooperates 

Cooperates 

3 

3 

In  this  article  we  examine  the  case  where  the  game  is 
iterated  several  times.  If  in  a  repeated  game  both 
players  always  defect,  they  will  get  a  by  far  lower 
total  pay-off  than  in  the  case  of  mutual  cooperation. 
Each  player  will  have  a  memory  of  the  last  3  moves. 
Based  on  the  experience  of  these  3  moves  a  decision 
is  made  if  the  player's  next  move  will  be  D  or  C.  To 
represent  a  strategy  that  decides  on  what  to  do  in  the 
next  move  based  on  the  history  of  the  previous  3 
games,  we  follow  Axelrod's  approach]  Details  about 
the  strategy  representation  can  be  found  in  [1]. 

To  determine  the  fitness  of  Shis  strategy,  each  player 
plays  against  every  other  player  10  times.  For  the 
population  size  of  20  chosen  here  this  means  200 
fights  per  individual  and  generation.  The  fitness 
value  that  is  assigned  to  a  strategy  is  the  average  total 
score  that  it  can  achieve  in  each  of  the  20  one-on-one 
encounters  with  another  strategy.  During  840  runs  of 
100  generations  each,  and  with  a  population  size  of 
20  individuals  the  following  results  were  recorded. 


Table  8:  Results  for  the  PDP 


|  Reference  GA 

Aging  GA  13.5 

Average 

30.363 

31.933 

Mean 

30.433 

32.015 

Std.  Dev. 

0.570 

0.809 

Optimality 

60.7% 

63.9% 

The  values  for  the  standard  deviations  around  the 
average  fitnesses  for  both  GAs  justify  the  statement 
that  the  improvement  achieved  by  the  Aging  GA  is 
significant.  The  optimality  measure  applied  here 
compares  the  average  result  of  each  GA  to  a 
theoretically  prefect  strategy  that  can  always  score 
the  perfect  pay-off  of  5  years.  Taking  this  into 
consideration,  an  increase  of  3.2%  in  optimality  of 
average  fitness  appears  even  more  significant. 
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9.  Conclusion 

This  research  introduced  the  concept  of  biological 
aging  to  the  domain  of  traditional  Genetic 
Algorithms.  The  effects  of  aging  upon  solving  four 
well-examined  optimization  problems  have  been 
proven  significant.  Further  research  in  this  field  has 
to  be  done  to  develop  reliable  theories  about  the 
relation  between  the  problem  domain  and  the  degree 
of  aging  that  should  be  applied.  A  rule  of  thumb  for 
this  appears  to  be  that  moderate  values,  which  do  not 
override  the  effect  of  the  actual,  external  fitness 
function,  are  most  appropriate.  However,  no  signs  of 
linearity  between  the  impact  of  aging  and  the  results 
of  the  optimization  process  were  detectable.  This 
suggests  the  application  of  a  Meta-GA  onto  the 
problem  of  finding  the  best  aging  influence  factor  for 
a  task. 
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1.  Abstract 

The  practical  deployment  of  distributed  agent-based 
systems  mandates  that  each  agent  behave  sensibly.  This 
paper  focuses  on  the  development  of  flexible,  responsive, 
adaptive  systems  based  on  Sensible  Agents.  Sensible 
Agents  perceive,  process,  and  respond  based  on  an 
understanding  of  both  local  and  system  goals.  Each  agent 
is  capable  of  (1)  deliberative  or  reactive  planning  and 
execution  of  one  or  more  domain-specific  service  /task,  (2) 
maintaining  and  interpreting  knowledge  about  states, 
events,  and  goals  related  to  itself,  other  agents,  and  the 
environment,  and  (3)  adapting  its  behavior  according  to  its 
understanding  of  its  own  local  goals  and  overall  system 
goals.  This  paper  addresses  the  above  issues  in  the  context 
of  applied  semiotics,  a  field  that  analyzes  and  develops  the 
formal  tools  of  knowledge  acquisition,  representation, 
organization,  generation,  enhancement,  communication, 
and  utilization.  A  Sensible  Agent  architecture  has  been 
developed  where  each  agent  is  composed  of  five  modules:  a 
Self-Agent  Modeler,  an  External  Agent  Modeler,  an  Action 
Planner,  an  Autonomy  Reasoner,  and  a  Conflict  Resolution 
Advisor. 

Keywords:  autonomous  agents,  dynamic  organizations, 
adaptive  systems,  planning 

2.  Research  Motivation 

There  are  two  critical  aspects  to  the  old  maxim,  "It  was 
the  best  decision  that  could  be  made  under  the 
circumstances."  "Best  decision"  and  "circumstances" 
reflect  both  the  quality  of  the  decision  and  the  confining 
constraints.  If  decisions  were  unconstrained,  the  best 
decision  would  depend  solely  on  relationships  between 
the  decision  variables  and  a  suitable  measure  of  decision 
quality.  Consideration  of  the  circumstances  reflects  a 
more  realistic  view  for  decision-makers.  The  state  of 
the  world  or  environment  in  which  the  problem  presents 


itself  acts  as  a  set  of  constraints  restricting  the 
application  or  acquisition  of  knowledge. 

Research  investigations  have  explored  the  decision- 
making process  and  attempted  to  automated  portions  of 
it.  A  large  body  of  work  has  framed  this  problem  as 
follows:  Given  a  goal,  how  should  the  system  search 
through  a  set  of  solutions  (a  solution  space)  to  arrive  at 
the  optimal  goal  state  given  a  relatively  static  picture  of 
the  world?  In  other  words,  what  is  the  best  decision  or 
plan  if  the  goal  and  constraints  guiding  the  search  for 
that  goal  remain  constant?  Underlying  the  premise  of  a 
constrained  solution  space,  a  limited  set  of 
communications  and  controls  is  allowed  for  derivation 
of  feasible  plan  sets. 

The  true  complexity  of  many  problem  domains  (e.g. 
military  command  and  control)  is  reflected  by  the 
following  facts  which  make  the  above  assumptions 
impractical  for  dynamic  applications: 

•  Goals  are  not  static. 

•  Multiple  types  of  goals  exist. 

•  Goals  and  plans  to  achieve  those  goals  can  conflict 
with  one  another. 

•  Circumstances  change  often  and  unpredictably. 

Therefore,  consideration  of  the  decision-making  as  a 
process  by  which  some  entity  (human  or  automated) 
searches  through  a  set  of  possible  solutions,  known  a- 
priori,  guided  only  by  static  constraints  to  arrive  at  its 
goals  is  just  scratching  the  surface  of  the  real  problem. 
To  address  the  true  complexity  of  dynamic  and 
unpredictable  situations,  it  is  important  to  understand 
that  making  the  "best  decision  under  the  circumstances" 
involves  not  only  intellect  but  the  ability  to  be 
innovative  (make  changes),  flexible  (adjust  to  change) 
and  responsive  (react  or  execute  readily). 


This  research  is  supported  in  part  by  the  Texas  Higher  Education  Coordinating  Board  #003658452  and  the  Applied  Research  Laboratories 
and  Office  of  Naval  Research  Grant  N00014-96-1-0298. 
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Decisions  rarely  occur  in  isolation.  A  decision-maker 
must  not  only  assess  its  own  behaviors  (its  states  and 
the  events  it  is  capable  of  processing)  but  also  the 
behaviors  of  others  possessing  the  ability  to  impact  its 
circumstances.  Additionally,  the  decision-maker  must 
recognize  the  impact  or  necessity  of  interacting  with 
others  to  develop  and  execute  plans.  These  interactions 
and  behaviors  impact  decisions,  often  acting  as  "the 
circumstances"  driving  a  need  to  make  a  decision. 
Therefore,  intellect  or  knowledge  as  well  as  innovation, 
flexibility,  and  responsiveness  is  often  derived  from  a 
decision-maker's  ability  to  dynamically  1)  assess  current 
and  potential  roles  others  play  in  interactions  and  2) 
establish  beneficial  roles  in  these  interactions.  As 
situations  change,  the  interactions  supporting  the 
decision-making  process  must  also  change.  Simply 
put,  the  decision-maker  must  not  only  be  smart  but 
capable  of  applying  those  "smarts,"  deciding  who  to 
work  with,  if  anyone,  in  an  appropriate  fashion  at  the 
right  time.  To  address  these  issues,  dynamic 
configuration  of  organizations  is  a  must. 

Organizations  of  decision  makers  must  be  capable  of 
configuring  themselves  given  dynamic  constraints.  For 
a  member  of  the  organization,  these  constraints  include 
the  state  of  the  decision-maker,  events  triggering 
responses,  states  and  events  observed  from  the 
organization  members  and  the  environment,  the  decision 
maker's  own  goals  and  the  goals  of  the  system  in  which 
they  are  a  member.  The  research  presented  here  is 
driven  by  these  motivations  for  dynamic  organizations 
and  the  requirements  for  underlying  theory  and 
technology  required  to  render  this  capability. 

3.     Research  Overview 

The  practical  deployment  of  distributed  agent-based 
systems  in  dynamic,  complex  environments  mandates 
that  each  agent  behave  sensibly,  incorporating  an 
understanding  of  both  global  system  goals  and  its  own 
local  goals.  Sensible  Agents,  capable  of  Dynamic 
Adaptive  Autonomy,  address  many  of  the  challenges 
encountered  by  multiagent  systems.  A  Sensible  Agent 
(1)  maintains  a  representation  of  both  behavioral 
knowledge  and  declarative  knowledge  of  itself  and  other 
system  agents,  (2)  develops  and  processes  an 
understanding  of  both  local  goals  and  system  goals,  and 
(3)  performs  either  deliberative  or  reactive  actions  with 
respects  to  its  own  internal  events  or  external  events 
from  other  system  agents  or  the  environment. 

A  critical  consideration  for  this  behavior  is  the  agent's 
level  of  autonomy.   The  term  level  of  autonomy  refers 


to  the  types  of  roles  an  agent  plays  in  its  planning 
interactions  with  other  agents.  Specifically,  this 
research  seeks  to  prove  the  following  hypothesis:  The 
operational  level  of  agent  autonomy  is  key  to  an  agent's 
ability  to  respond  to  dynamic  situational  context,  (i.e. 
the  states,  events,  and  goals  that  exist  in  a  multiagent 
system),  conflicting  goals,  and  constraints  on  behavior. 
Levels  of  autonomy  are  defined  along  a  spectrum.  (1) 
Command  driven  —  the  agent  does  not  plan  and  must 
obey  orders  given  by  another  agent,  (2)  Consensus  - 
the  agent  works  as  a  team  member  to  devise  plans,  (3) 
Locally  Autonomous  —  the  agent  plans  alone, 
unconstrained  by  other  agents  and,  (4)  Master  —  the 
agent  devises  plans  for  itself  and  its  followers  who  are 
command-driven.  These  conceptual  autonomy  levels  are 
tied  to  the  responsibility  of  an  agent  to  plan  for  solving 
its  goals. 

Sensible  Agents  maximize  the  innovation,  flexibility 
and  responsiveness  of  multiagent  planning  systems 
operating  under  dynamic  and  constrained  military 
situations.  Consequently,  the  Sensible  Agent 
technology  will  increase  the  ability  of  agents  to 
coordinate  the  following: 

•  planning  activities  driven  by  an  understanding  of  both 
local  and  system  goals  as  well  as  the  behaviors  of 
other  system  agents  and  the  environment. 

•  the  roles  individual  planning  agents  play  in 
interactions  with  other  agents  by  allowing  dynamic 
assignment  of  autonomy  (i.e.  dynamic  assignment  of 
responsibility,  commitment,  authority  and 
independence)  with  respect  to  each  goal  held  by  a 
respective  agent. 

4.     Related  Work 

Several  researchers  have  addressed  the  development  of 
adaptive,  intelligent,  agent-based  systems.  These  efforts 
range  from  representing  coordination  and  collaboration 
[1],  to  production  rule-based  systems  for  organizational 
self-design  [2],  to  partial  global  planning  [3],  to  the 
development  of  multiagent  planning  architectures  [4], 
and  many  others.  The  majority  of  previous  work  has 
focused  on  providing  architectures  with  a  constrained  set 
of  control  schemes  for  agent  interaction.  In  other  words, 
one  organization  of  decision-makers  is  developed  for  a 
problem  domain.  Although  some  new  architectures 
provide  the  mechanisms  through  which  different 
organizations  can  be  realized,  these  architectures  do  not 
support  the  reasoning  processes  necessary  to  implement 
self-reconfiguration  by  the  agents  themselves. 
Therefore,  a  significant  portion  of  this  work  makes  an 
a-priori  assignment  of  autonomy  to  each  agent  in  the 
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system.  This  Sensible  Agent  architecture  and 
underlying  technology  pushes  this  envelope  in  an  effort 
to  provide  dynamic  configuration  of  organizations 
allowing  adaptive  control  of  agent  interaction  to 
maximize  their  intellect  and  minimize  the  limitations 
imposed  by  ever-changing  constraints  (e.g. 
circumstances). 

This  research  advances  previous  work  in  the  field, 
realizing  that  one  specific  level  of  autonomy  may  not 
be  appropriate  or  even  achievable  at  all  times  and  under 
all  conditions.  Investigations  of  related  research  have 
not  identified  a  comprehensive  method  for  agents  to 
employ  to  select  autonomy  levels  given  dynamic 
situations.  Adaptive  autonomy,  the  ability  to  transition 
autonomy  levels  during  system  operation,  provide 
agents  the  ability  to  dynamically  choose  the  most 
appropriate  scheme  of  behavior  and  interaction.  This 
research  seeks  to  provide  agents  with  the  capability  to 
reason  about  autonomy.  This  research  effort  does  not 
focus  on  improving  the  problem-solving  methods 
available  at  particular  autonomy  levels.  Many  ongoing 
research  efforts  focus  on  optimizing  the  planning 
process.  Sensible  Agents  make  use  of  existing  problem- 
solving  methods  that  best  fit  their  particular  tasks.  To 
accomplish  this,  agents  must  know  how  to  plan  at  each 
of  the  possible  levels  of  autonomy.  Previous  and  on- 
going research  provides  inroads  to  these  capabilities. 
The  command  driven  /  master  agent  relationship 
employs  traditional  centralized  planning  or  distributed 
centers  of  planning  control  [5][6].  Planning  under  the 
influence  of  a  master  agent  has  also  been  examined 
specifically [7].  On  the  other  hand,  agents  developing 
plans  for  consensus  must  negotiate  to  reach  an 
agreement  by  which  all  goals  are  satisfied.  The 
contract-net  protocol  and  its  extensions  offer  agents  this 
capability  [8].  Additional  negotiation  mechanisms 
[9][10]  provide  additional  support  for  agent  interactions 
at  the  consensus  level  of  autonomy. 

Locally  autonomous  agents,  which  plan  independendy 
but  act  as  part  of  a  larger  system,  exhibit  the  most 
diverse  behavior.  They  may  be  fully  cooperative  and  act 
under  a  functionally  accurate,  cooperative  distributed 
system  (FA/C)  [11][12][13].  Alternatively,  they  may 
choose  to  act  more  selfishly.  Durfee  and  Lesser's  [3] 
method  of  communicating  partial  global  plans  allows 
agents  to  interact  within  a  system  in  many  different 
ways.  However,  they  leave  the  method  of  choosing  an 
interaction  style  open.  Locally  autonomous  agents  are 
unique  in  that  they  may  cooperate  to  solve  a  system 
goal  without  communicating.  This  function  can  be 
supported  by  Tenney  and  Sandell's  work  on 
coordination  [14][15].   If  the  autonomy  level  of  agents 


in  a  system  is  allowed  to  change,  the  effective 
organizational  structure  of  the  system  also  changes. 
For  this  reason,  it  is  important  to  understand  the  effect 
of  different  organizational  structures.  Previous  work 
provides  this  insight  in  the  form  of  formal  studies  of 
large-scale  dynamic  systems  [14]  and  investigations  of 
dynamic  creation  and  destruction  of  agents  [16].  The 
mechanics  of  changing  organizational  structure  is  also 
an  important  area  of  study  .  Several  researchers  have 
investigated  how  agents  can  form  problem-solving 
groups  [17][1].  In  addition,  centralized,  distributed,  and 
group  approaches  have  been  compared  [18][19].  These 
studies  provide  important  insight  for  autonomy 
reasoning.  Gasser  and  Ishida  [2]  also  provide  an 
understanding  of  both  adaptive  self-configuration 
(through  Organization  Self-Design)  and  tradeoffs 
corresponding  to  different  organizational  structures. 

5.     Research  Approach 

The  Sensible  Agent  architecture  (Figure  1)  permits 
dynamic  adaptation  of  agent  autonomy.  One  agent  is 
highlighted  and  referred  to  as  the  self-agent  in  the 
discussions  below. 


A  Sensible  Agent's  behaviors  are  described  by  its 
internal  states,  events  and  goals.  A  Sensible  Agent's 


Figure  1:   Overall  Architecture  of  Agents 
with  Dynamic  Adaptive  Autonomy 
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understanding  of  the  external  world  is  built  from 
interpretations  of  the  states,  events,  and  goals  of  other 
agents  and  the  environment.  Each  agent  is  (1) 
responsible  for  one  or  more  domain-specific 
service/tasks,  (2)  responsible  for  the  maintenance  of  its 
internal  states,  events,  and  goal  structure,  (3)  capable  of 
interpreting  external  states,  events,  and  goals,  (4) 
capable  of  detecting  and  resolving  conflicts,  and  (5) 
capable  of  assigning  and  adapting  an  autonomy  level  for 
each  goal  in  the  agent's  goal  structure.  Each  agent 
consists  of  five  major  modules. 

•  The  Self- Agent  Modeler  contains  the  behavioral 
model  of  the  agent.  This  module  interpret  internals  or 
external  events  acting  on  the  agent  and  change  its 
state  accordingly  [20][21][22][23].  Other  modules 
(within  the  self-agent)  can  access  this  model  for 
necessary  state  information. 

•  The  External  Agent  Modeler  contains  knowledge 
about  other  agents  and  the  environment.  This  module 
maintains  beliefs  about  states  and  events  external  to 
the  agent  and  predict  the  actions  of  other  agents 
[20][21][22][23][24].  Other  modules  within  the  self- 
agent  can  monitor  this  model  for  changes  external  to 
the  self  agent  that  affect  their  reasoning  processes. 

•  The  Action  Planner  solves  domain  problems,  store 
agent  goals,  and  execute  problem  solutions.  This 
module  interacts  with  the  environment  and  other 
agents  in  its  system.  It  also  carries  out  solutions  for 
conflict  resolution.  Communication  between  agents 
is  handled  by  this  module.  The  Action  Planner  draws 
information  and  guidance  from  all  other  modules. 

•  The  Conflict  Resolution  Advisor  module  identifies, 
classifies,  and  generates  possible  solutions  for 
conflicts  occurring  between  the  self-agent  and  other 
agents  [25][26].  This  module  monitors  the  action 
planner,  self-agent  modeler,  and  external  agent 
modeler  to  identify  conflicts.  It  then  offers 
suggestions  to  the  action  planner  or  Autonomy 
Reasoner  in  order  to  resolve  the  conflict. 

•  The  Autonomy  Reasoner  determines  the  appropriate 
autonomy  level  for  each  goal,  assigns  an  autonomy 
level  to  each  goal,  and  reports  autonomy-level 
constraints  to  other  modules  in  the  self-agent.  It  also 
handles  all  autonomy  level  transition  and  requests  for 
transition  made  by  other  agents.  The  autonomy 
reasoner  contains  utility-based  assessment  of 
autonomy  levels.  The  autonomy  reasoner  draws 
information  from  all  other  modules. 

All  modules  within  a  Sensible  Agent  are  domain 
independent  except  for  the  action  planner.  Specifically, 
these  modules  contain  domain  independent 
representations  and  reasoning  mechanisms. 
Instantiations  (i.e.  knowledge  instantiated  in  a  particular 


representation)  are,  of  course,  domain  dependent.  For 
example,  domain  independent  behavioral  representations 
in  the  self  agent  models  specify  how  states,  events,  and 
transitions  are  represented.  The  actual  instantiations  of 
states,  events,  and  goals  and  their  respective 
relationships  are,  of  course,  domain  dependent. 
Analogously,  reasoning  mechanisms  for  detecting  and 
resolving  conflicts  are  domain  independent  while  the 
actual  execution  utilizes  knowledge  instantiations  (e.g. 
goal  priorities)  which  are  very  much  domain  dependent. 

Organization  Formation:  Autonomy 

Representation  and  Reasoning  The  autonomy 
spectrum  (i.e.  command-driven,  consensus,  local  and 
master)  conveys  the  intent  and  scope  of  each  autonomy 
level.  However,  these  intentions  must  be  formalized 
and  modeled  computationally  in  order  to  implement 
Dynamic  Adaptive  Autonomy  (DAA)  in  agent-based 
systems.  Autonomy  levels  are  represented  by  four 
autonomy  constructs:  Responsibility  (R):  a 
measure  of  how  much  the  agent  must  plan  to  see  a  goal 
solved;  Commitment  (c):  a  measure  of  the  extent  to 
which  a  goal  must  be  solved;  Authority  (A):  a 
measure  of  the  agent's  ability  to  access  system 
resources;  Independence(i):  a  measure  of  how  freely 
the  agent  can  plan. 

The  autonomy  level,  AL,  is  a  4-tuple  (R,  c,  A,  i). 
These  four  autonomy  constructs  provide  the  foundation 
for  a  complete  computational  model  of  an  agent's 
autonomy  level.  The  autonomy  constructs  can  be  used 
in  numerous  ways  to  reason  about  agent  autonomy, 
guide  agent  behavior,  and  facilitate  system  interactions 
and  formation  of  problem-solving  groups.  It  is  these 
four  constructs  that  the  Autonomy  Reasoner 
manipulates  to  perform  autonomy  level  assignments 
and  transitions. 

A  representation  specifying  how  the  different  autonomy 
constructs  affect  utility  (i.e.  costs  and  benefits)  follows. 
Utilities  for  a  proposed  autonomy  level  allow 
determination  of  the  feasibility  for  the  agent  to  assume 
the  proposed  autonomy  level  with  respect  to  the  goal. 
An  agent  assigns  two  utility  values  (Uagent,  Usystem)  to 
each  goal,  where  Uagent  represents  the  goal's  value  to  the 
agent  itself,  and  Usystera  represents  the  goal's  value  to  the 
agent's  overall  system. 

Let  A  denote  the  set  of  agents  in  the  system.  Let  g'j, 
denote  the  jth  goal  of  the  agent  A'.  The  autonomy  level 
of  goal  g'j  is  represented  by  the  4-tuple  (R,  c,  A,  i).  Let 
fc  be  a  function  that  returns  a  tuple  (/c,agent,  /c,system)j 
using  the  commitment  index,  c,  for  a  specific  goal,  g'j, 
where  /c  agem  and  /c,systera  denote  the  impact  of  the 
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commitment  index  on  the  agent  and  the  system, 
respectively.  Similarly,  we  define  /R  and  f  ,  that  denote 
the  impact  of  the  responsibility  distribution  and  the 
independence  index,  respectively.  Let  / A  denote  the  cost 
of  accessing  resources  used  by  the  agent  to  fulfill  goal 
g'j.1  It  is  important  to  note  that  the  value  of  /c,system 
(and  the  other  functions)  is  not  necessarily  consistent 
among  agents,  differences  may  appear  due  to  the 
perceptions  each  agent  has  about  the  system. 

Let  F'c  denote  (FCiagent,  Fc>syslem)  for  a  given  agent  A. 
Fcagem  ^  Fcsystem  is  computationally  derived  from  all 
the  individual  values  of  /Ciageni  and  /c>system  from  the 
current  goals  of  the  agent.  Similar  values  are  derived 
for  F'R,  F'A,  and  F'j.  In  this  research,  utility  represents 
the  degree  of  benefit  that  an  agent's  goals  provide  an 
agent.  The  utility  of  all  an  agent's  goals  to  the  agent 
will  therefore  be  given  by: 

*  Uagenloc/z;(Fca8enl,F^agenl,FA^ageiH,FUgent)  (Ecj  1) 

TJ agent  is  assumed  to  be  positively  correlated  with  the 
four  parameters  of  function  h,.  Similarly,  Usystem  is 
given  by: 

•  Usystem°<:/j2  (Fcsys[era,FRsystem,FAsvstem,Fisystem)  (Eq  2) 

Again,  Usyslem  may  not  be  consistent  among  all  agents. 
Let  U\  given  by  (Uagent,Usystem),  denote  a  utility  vector 
for  the  agent.  This  utility  vector  is  used  in  determining 
the  benefit  that  can  be  attained  from  a  goal.  Let  A1  be  an 
agent  that  has  accepted  a  candidate-goal,  gn.  Let  the 
candidate-goal's  possible  autonomy  level  assignments 
be  represented  by  the  set  of  levels  {L0,L1,...,Lm}.  In 
selecting  a  level,  A'  is  assigning  an  autonomy  level  to 
gn  and  must  therefore  predict  the  utility  vector  associated 
with  each  autonomy  level. 

6.  Summary 

Whether  acting  as  completely  automated  decision 
makers  or  acting  as  assistants  for  decision  support, 
Sensible  Agents  will  allow  planning  systems  to 
perform  with  the  kind  of  innovation,  flexibility,  and 
responsiveness  demanded  of  dynamic  situations  (e.g. 
dynamic  requirements  planning  resources  such  as  data, 
communications  and  time).  Significant  progress  has 
been  made  in  the  area  of  intelligent  planning  systems 
and  should  not  be  discarded.  The  Sensible  Agent 
architecture  provides  a  mechanism  to  leverage  legacy 


1  For  simplicity  reasons,  the  subscripts  i  and  j  to  denote  the  current 
goal,  g'j,  of  agent  A',  have  been  omitted  from  the  description  of 
functions  fc,  /R,  /A,  and  /  j.  When  a  distinction  between  two  goals  is 
necessary,  we  will  use  the  corresponding  subscripts. 


planners  while  providing  these  planners  the  ability  to 
dynamically  adapt  their  level  of  autonomy. 

Both  the  focus  of  previous  research  and  the  progression 
of  systems  engineering  and  architecture  work  offer  a 
critical  foundation.  While  a  good  foundation  exists, 
current  work  is  limited  by  a  rather  static  view  of  the 
architectures  to  support  decision  making.  The 
messages,  or  at  least,  the  content  of  message  may  be 
dynamic  in  current  architectures.  Yet,  static  interactions 
(i.e.  pre-defined  control  mechanisms)  for  planning 
execution  force  the  systems  engineer  of  the  decision 
making  system  not  the  situation  in  which  the  decision 
making  systems  operates  to  dictate  the  available 
interactions  for  planning  and  execution.  We  have 
developed  simulations  for  the  naval  radar  frequency 
management  problem  which  support  our  claims 
regarding  the  limitations  of  static  organizations.  These 
limitations  include  the  following: 

•  Overuse  of  scarce  communication  resources  in  some 
circumstances 

•  Slow  response  due  to  fixed  communication  and 
negotiation  protocol  that  does  not  vary  with 
circumstances  (e.g.  adaptation  of  interactions  based  on 
geographical  distribution  of  emitter,  available 
spectrum  resources,  and  other  factors). 

•  Inability  to  reason  effectively  about  third  party 
interference  sources  and  intentions  (sources  that  are 
not  members  of  the  system). 

A  Sensible  Agent's  ability  to  1)  reason  about  the  utility 
of  autonomy  levels  with  respect  to  both  local  and 
systems  goals  and  2)  assign  autonomy  levels  given 
utility-based  assessments  provides  significant  advances 
in  the  development  of  dynamic,  responsive  systems. 
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Abstract 

Achieving  autonomous  learning  systems  which  can  govern 
themselves  is  one  of  the  goals  of  A.I.  Most  learning  systems  explore 
a  fixed  model  space  to  explain  a  set  of  data.  We  believe  that  the 
"best"  but  most  distinct  models  in  the  available  space  can  provide 
insight  into  questions  of  autonomy  such  as  when  to  change  the  model 
space  and  how  to  generate  new  data  points  (via  experiments).  We 
explore  this  idea  by  focusing  on  clustering  problems  where  the  initial 
data  is  known  to  be  insufficient  to  find  the  true  model.  We  propose  a 
method  to  generate  new  data  points  via  experiments.  Our  approach 
results  in  convergence  to  the  true  model  using  half  as  many 
additional  data  points  than  if  they  were  randomly  selected. 

KEYWORDS:  Autonomous  learning,  machine  discovery, 
clustering,  unsupervised  learning 

1.  Introduction  and  Motivation 

If  inductive  learning  aims  at  answering  the  question,  "What 
does  the  data  tell  us  ?",  autonomous  learning  adds  the 
question,  "What  can  we  now  do  to  better  understand  the 
domain  ?". 

So  what  can  we  do  to  better  understand  the  domain  to  which 
we  are  applying  our  learning  system  ?  Inductive  learning  like 
most  artificial  intelligence  problems  is  inherently  a  search 
through  a  predefined  model  space.  Most  inductive  learning 
tools  whether  they  be  unsupervised  [1]  or  supervised  [2], 
primarily  focus  on  finding  the  single  best  model  with  respect 
to  some  criterion  for  a  fixed  set  of  data.  This  is  quite  adequate 
if  the  tool  is  to  be  used  by  a  human  who  can  interpret  the 
results  and  make  appropriate  changes.  To  make  such  tools 
autonomously  learn  more  about  a  domain  we  must  address 
problems  of  how  to  change  the  model  space  and  how  to 
generate  new  data.  It  is  our  belief  that  finding  and  using 
multiple  models  can  provide  insight  into  these  more  complex 
questions  associated  with  autonomous  learning. 
In  this  paper  we  focus  on  using  multiple  models  to  answer  the 
question,  "Given  the  current  data  and  the  best  model(s)  found, 
what  should  be  the  next  set  of  experiments  to  conduct  be  to 
find  the  true  model  for  the  domain  ?".  Which  model  is  better 
for  a  given  set  of  data  has  been  addressed  by  the  minimal 
encoding  length  approach  independently  proposed  by  Wallace 
(1968)  [1]  and  Rissanen  (1978)  [3].  Their  approach  has  the 
benefit  that  complex  models  are  chosen  over  simpler  ones 
only  if  the  data  available  justifies  it.  But  to  our  knowledge  the 
approach  provides  no  indication  of  how  to  generate  new  data 
points.  Whilst  we  focus  on  this  question  in  this  discourse,  we 
believe  our  approach  could  be  used  to  determine  how  to 


change  the  model  space  and  other  questions  associated  with 
autonomous  learning.  We  intend  to  explore  these  at  a  latter 
time. 

This  paper  documents  our  approach  for  finding  and  using 
multiple  models  for  clustering  problems  otherwise  known  as 
unsupervised  learning.  The  paper  is  divided  into  a  further  six 
sections.  The  first  is  a  basic  introduction  to  clustering  which 
provides  the  terminology  used  throughout  this  paper.  In  the 
next  section  we  define  in  limited  detail  the  clustering  system 
we  have  developed  (a  more  complete  description  exists  [4]). 
The  criterion  used  to  evaluate  each  model  (the  minimum 
message  length)  and  our  search  mechanism  (simulated 
annealing)  are  described.  The  subsequent  sections  outline  how 
we  search  the  model  space  to  find  multiple  models  and  then 
how  these  can  be  used  to  answer  our  next  experiment 
question.  The  final  two  sections  discuss  and  conclude  our 
current  work  and  touches  on  future  research. 

2.  An  Introduction  To  Clustering 

Clustering,  also  called  unsupervised  or  intrinsic  classification, 
has  a  long  history  in  numerical  taxonomy  [5]  and  machine 
learning  [6].  Clustering  attempts  to  find  groups  within  data  so 
as  to  better  understand  the  domain  the  data  is  from.  It  has  been 
applied  to  generation  of  taxonomies  for  flora  and  fauna, 
concept  formation  and  data  mining.  The  objects/entities  to  be 
clustered  are  each  described  by  a  set  of  d  attributes.  Clustering 
involves  determining  the  number  of  classes  (groups),  a 
description  for  each  class  which  can  be  used  to  determine 
membership  and  assigning  each  object  to  one  or  more  of  these 
classes.  As  the  number  of  classes  is  unknown  and  no  pre- 
classified  training  set  exists,  clustering  is  unsupervised.  The 
collection  of  classes  and  their  descriptions  form  a 
taxonomy/model  of  the  objects. 

A  clustering  system  contains  three  major  parts.  The  knowledge 
representation  scheme  (KRS)  which  defines  the  searchable 
model  space.  The  criterion  which  provides  a  "goodness" 
measure  for  each  model  and  the  search  mechanism  which 
explores  the  model  space  attempting  to  find  the  model  which 
leads  to  the  optimal  criterion  value. 

The  KRS  determines  the  type  of  classes  and  their  possible 
interrelationships.  A  dichotomy  for  clustering  options  which 
impact  on  the  KRS  has  been  defined  elsewhere  [7].  The 
criterion  evaluates  the  "goodness"  of  each  of  the  models.  It  is 
usually  a  real  value  function  that  takes  as  parameters  the 
objects  and/or  class  descriptions  and  is  the  objective  function 
of  the  search. 
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The  search  mechanism  explores  the  model  space  attempting  to 
find  the  best  model  by  finding  the  optimal  (either  minimum  or 
maximum)  value  of  the  objective  function.  For  all  but  the 
most  restrictive  model  spaces  the  number  of  possible  models 
to  evaluate  is  combinatorially  large.  Exhaustively  evaluating 
each  model  is  not  even  considered  as  a  search  mechanism.  The 
search  mechanism  must  consistently  find  the  global  optima  or 
at  least  a  good  local  optima  in  a  number  of  different 
application  domains  with  a  minimum  of  computation. 

3.  Our  Clustering  System 

The  clustering  system  developed  merges  together  two 
problem-invariant  (robust)  technologies:  the  minimum 
message  length  criterion  (MML)  and  simulated  annealing 
(SA).  This  has  so  far  shown  to  result  in  a  clustering  system 
which  can  be  applied  to  a  number  of  different  problems  with 
minimum  changes.  The  objective  function  of  our  search  is  to 
minimize  the  message  length  for  non-hierarchical  and 
probabilistic  classes  which  objects  are  exclusively  assigned  to. 
However,  most  large  and  interesting  search  problems  possess 
many  local  optima  [14].  We  feel  that  SA  is  a  good  search 
mechanism  to  explore  these  complex  model  spaces,  since  it 
can  escape  local  minima  [14].  In  the  following  sub-sections 
we  describe  the  two  technologies. 

3. 1  The  Minimum  Message  Length  Criterion 

Chaitin  [8],  Kolmogorov  [9]  and  Solmonoff  [10]  in  varying 
forms  independently  proposed  algorithmic  information  theory 
(AIT).  AIT  intuitively  allows  us  to  quantify  the  notion  of 
complexity  and  compressibility  of  objects.  Learning  by 
induction  is  inherently  about  compressing  observations  (the 
objects)  into  a  theory  (the  model).  Boyle's  law  (P  =  k.N/V)  on 
ideal  gases  relates  the  number  of  molecules  (AO  in  a 
measurable  closed  volume  (V)  to  pressure  (/>).  A  table  could 
store  every  possible  combination  of  N  and  V  and  the  resultant 
pressure.  However,  Boyle's  law  compresses  this  table  into  a 
much  shorter  description,  the  above  equation. 
Wallace  and  Boulton  [1],  extend  this  compressibility  notion 
into  their  minimum  message  length  (MML)  approach  to 
induction.  They  define  a  criterion  which  can  be  used  to  select 
the  most  probable  model  from  a  given  set  of  mutually 
exclusive  and  exhaustive  models,  H*,  for  the  objects,  D.  The 
MML  approach  specifies  that  the  minimal  encoding  of  the 
model  and  the  objects  given  the  model  is  the  best.  In  terms  of 
Bayes  theorem,  we  wish  to  maximise  the  posterior 
distribution,  P(Hj  \  D,c)  where  c  is  the  background  context: 


■logP(HilD,c)  =  -\ogP(Hi\c)  + 

-  log  P(D\  H, , c)  +  const 


(2) 


P(H,\D,c)  = 


/>(//,[  c).P(D|//,.,c) 
P{D) 


(1) 


Our  interest  is  in  comparing  relative  probabilities  so  we  can 
ignore  const.  Information  theory  [Shannon]  tells  us  that  -log 
(P (occurrence))  is  the  minimum  length  in  bits  to  encode  the 
occurrence.  Hence  by  minimising  equation  (2)  we  inherently 
maximise  the  posterior  distribution  and  find  the  most  probable 
model.  The  expression  to  minimise  has  two  parts,  the  first 
being  the  encoding  of  the  model  and  the  second  the  encoding 
of  the  objects  given  the  model.  The  object  collection  is 
random  if  the  size  of  encoding  the  model  and  the  objects  given 
the  model  is  approximately  equal  to  the  size  of  directly 
encoding  the  objects.  That  is  there  is  no  way  to  compress  the 
objects  into  a  shorter  description/theory.  The  two  part  message 
is  precisely  described  for  intrinsic  non-hierarchical 
classification  [1]  and  [11]. 

The  MML  criterion  only  defines  a  goodness  measure  for  a 
model  with  an  inherent  bias  towards  simple  models.  It  does 
not  indicate  how  to  search  the  model  space.  To  do  that  we  use 
simulated  annealing. 

3.2  Searching  The  Model  Space  Using  Simulated 
Annealing 

The  Metropolis  criterion  was  first  used  as  a  Monte  Carlo 
method  for  the  evaluation  of  state  equations  in  statistical 
mechanics  by  Metropolis  et  al.  [12].  Kirkpatrick  et  al.  [13] 
demonstrated  how  using  the  Metropolis  criterion  as  a  test  in 
iterative  optimisation  can  solve  large  combinatorial 
optimisation  problems.  They  called  their  approach  the 
simulated  annealing  technique  as  it  mimics  the  annealing  of  a 
piece  of  metal  to  minimise  the  energy  state  of  the  molecules 
within  the  metal.  SA  is  an  iterative  local  optimisation 
technique.  At  any  time  there  is  only  one  current  solution 
which  is  slightly  changed  at  each  iteration.  As  SA  is  a  Markov 
process  the  current  solution,  Sn,  at  time  n,  is  a  result  of  the 
perturbation  of  solution  Sn.j.  The  algorithm  continually 
perturbs  the  current  solution  to  generate  new  candidate 
solutions.  SA  unconditionally  accepts  candidates  of  better 
quality  than  the  previous  solution  and  conditionally  accepts 
those  of  a  worse  quality  with  a  probability  p,  where: 


(I  Difference  in  Quality]  \ 
_  System  Temperature  ) 


(3) 


Taking  the  logarithm  of  this  expression  yields 


Worse  quality  solutions  can  be  accepted  which  allows  the 
search  to  escape  from  local  minima  which  are  common  in 
most  complex  search  problems  [13].  We  set  the  initial 
temperature  To,  so  there  is  a  90%  probability  of  accepting  an 
increase  in  cost.  This  probability  decreases  as  the  temperature 
decreases.  The  cooling  constant,  R  reduces  the  temperature 
such  that,  T/c  =  Tk-l-R. 
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Tn=- 


loge(0.9) 

Cq  is  the  goodness  evaluation  of  the  initial  solution. 


(4) 


The  implementation  of  our  algorithm  can  be  found  in  [4]. 
Simulated  annealing  statistically  guarantees  that  the  global 
optimum  will  be  found,  if  the  thermodynamic  equilibrium  is 
reached  at  every  temperature  and  the  cooling  schedule  is  slow 
enough  [14].  However,  this  is  an  infinitely  long  process.  We 
do  not  maintain  these  two  requirements  due  to  the  need  to  find 
a  solution  in  finite  time.  Instead,  after  a  fixed  number  of 
iterations  at  a  temperature,  the  temperature  is  reduced  and  the 
cooling  constant  provides  discrete  changes  in  the  temperature. 
However  non-ideal  SA  approaches,  such  as  the  one  we  use, 
still  find  good,  local  optima  solutions  [14]. 

4.  Finding  Multiple  Models 

Our  thesis  is  that  distinct  but  good  models  can  be  used  to 
generate  new  experiments  whose  results  can  be  used  to  better 
understand  the  domain.  This  requires  finding  the  n  models 
which  provide  the  best  values  for  the  objective  function  but 
are  sufficiently  different  from  each  other.  Just  finding  the  n 
best  models  would  most  likely  result  in  finding  a  good  model 
and  slight  variations  of  it. 

To  achieve  our  aim  we  must  handle  two  key  issues.  Firstly,  we 
must  be  able  to  quantify  the  difference  between  two  models. 
Secondly  we  must  adjust  our  search  mechanism.  Let  us 
discuss  the  first. 

4~l~Quantifying  The  Difference  Between  Models 

A  model  can  be  characterised  by  its  predictions  or  its  syntactic 
description.  A  model  (the  taxonomy)  makes  predictions  on 
how  to  group  together  objects.  Each  model  assigns  each  object 
to  a  cluster.  For  two  clusters  from  different  models,  we  can 
measure  the  similarity  between  them  by  counting  the  number 
of  common  objects.  For  two  models  we  can  measure  their 
similarity  by  counting  the  number  of  common  objects  for 
every  possible  combination  of  cluster  pairs  (one  from  each 
model).  This  is  inherently  a  measure  of  the  common  "cluster 
neighbourhood"  (clusterhood)  each  entity  has  in  two  different 
models.  This  measure  can  be  achieved  by  building  a  r  x  c 
contingency  table,  P,  where  r  is  the  number  of  clusters  in 
model  A  (MJ  and  c  is  the  number  of  clusters  in  model  B 
(MB).  The  cell  Py  in  the  table  holds  the  number  of  objects 
common  to  cluster  /  in  MA  and  cluster  j  in  MB.  The  total  count 
of  the  table  will  be  the  number  of  objects/entities  we  are 
clustering.  Where  MA  is  the  same  as  MB  only  the  leading 
diagonal  of  the  resultant  table  will  contain  non-zero  elements. 
A  model  can  also  be  characterised  by  its  description.  In 
clustering,  a  model  consists  of  classes  and  their  descriptions. 
In  our  approach  each  class  description  contains  a  probability 
distribution  for  each  attribute.  The  message  we  construct 


(whose  length  we  are  trying  to  minimise)  only  encodes  an 
attribute  distribution  of  a  class  if  it  is  sufficiently  different 
from  the  population's  (collection  of  all  objects)  distribution 
for  that  attribute.  We  can  characterise  the  descriptive 
difference  between  two  models  in  a  contingency  table,  D, 
which  has  the  same  structure  as  the  contingency  table  P.  For 
each  attribute  we  can  determine  which  of  the  clusters  for  each 
model  has  a  distribution  for  that  attribute  that  is  the  greatest 
from  the  populations.  The  cell  Djj  holds  the  count  of  attributes 
for  cluster  /'  in  MA  and  cluster  j  in  MB  whose  probability 
distribution  is  of  greatest  distance  from  the  population's 
distribution.  The  total  count  for  the  table  is  the  number  of 
attributes.  The  contigency  table  inherently  holds  the 
distinguishing  features  (attributes)  of  each  class. 
The  contingency  tables  P  and  D  contain  the  differences 
between  the  two  models  A  and  B  in  their  most  rudimentary 
forms  (predictions  and  descriptions).  We  can  use  this 
information  to  measure  if  a  relationship  exists  between  the  two 
models.  Note  we  do  not  attempt  to  determine  what  the 
relationship  is,  only  if  it  exists.  This  can  be  achieved  by  using 
a  number  of  different  contingency  table  association  measures 
[15].  We  choose  the  Goodman  and  Kruskal  lambda  measure 
of  predictive  ability  because  it  is  both  a  readily  interpretable 
probability  measure  and  is  not  symmetrical.  The  measure  X^ 
measures  the  ability  to  predict  the  cluster  in  model  B  given  we 
know  the  cluster  in  model  A.  It  should  be  noted  that  X/&  *  XBA 
is  generally  true.  That  is,  A  may  be  predictable  from  B  but  not 
B  from  A  and  vice  versa.  Specifically  X^  calculates  the 
relative  decrease  in  the  probability  of  an  error  in  guessing  the 
class  given  by  model  A  if  the  class  for  model  B  is  known. 
Formally  we  can  write: 


Pt-P, 


1 


Px  - 1  - max(ttj) 
r  r 


(5) 


-z 


1- 


max(«, ) 


N 


By  calculating  the  lambda  value  for  the  tables,  P  (X^P))  and 
D  (X^D))  we  can  measure  the  predictability  of  a  model  from 
another  in  terms  of  predictions  and  descriptions  respectively. 

4.2  Adjusting  The  Search 

The  ideal  annealing  algorithm  converges  to  the  global 
optimum.  However  the  trajectory  through  the  model  space  in 
getting  there  may  not  be  sufficiently  diverse  to  find  other  good 
but  different  local  optima.  We  must  therefore  adjust  our  search 
method  to  be  consistent  with  our  aim.  We  can  achieve  this  by 
introducing  a  bias  which  guides  the  search  away  from  already 
found  good  local  optima.  This  is  facilitated  by  storing  n 
models  which  are  the  best  (with  respect  to  the  objective 
function)  but  sufficiently  different  from  each  other.  These 
models  are  the  best  and  most  diverse  models  known. 
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Models  are  only  considered  to  be  stored  if  their  message 
length  is  less  than  any  of  the  currently  stored  models.  To  be 
stored,  the  summation  of  the  models  predictability  from  every 
other  stored  model  must  be  less  than  this  same  measure  for 
one  of  the  currently  stored  models.  The  model  whose 
predictability  is  the  greatest  is  replaced.  Predictability  is 
calculated  using  the  measures  for  either  the  P  or  D 
contingency  tables.  Candidate  models  have  a  penalty  added  to 
their  "goodness"  value  in  proportion  to  their  similarity  to  the 
stored  models. 

5.  The  Uses  of  Multiple  Models 

In  the  previous  section  we  defined  two  measures  of  difference 
between  models.  These  measures  can  be  encoded  in  a 
contingency  table  and  the  predictability  of  one  model  from 
another  calculated.  How  we  should  use  these  measures  to 
influence  our  search  depends  on  what  we  are  trying  to 
achieve.  In  this  paper  we  focus  on  what  the  next  best  set  of 
experiments  to  conduct  are. 

The  question  of  how  to  guide  the  next  experiments  to  conduct 
has  been  addressed  in  Lenat's  work  on  AM  [16]  and  Kulkarni 
and  Simon's  work  on  Kekada  [6].  Lenat  described  the  notion 
of  "interestingness"  and  felt  the  system  should  focus  its 
attention  on  interesting  phenomena.  Similarly,  Kekada  focuses 
its  next  experiments  on  surprising  phenomena,  believing  that 
if  a  result  of  an  experiment  was  unexpected  then  the 
knowledge  of  that  area  of  the  domain  is  obviously  lacking  and 
should  be  explored.  Both  approaches  use  heuristics  to  describe 
the  notions  of  interestingness  and  surprise.  As  we  hope  to  have 
available  the  best  but  most  different  models  we  focus  the  next 
set  of  experiments  where  these  models'  predictions  differ.  By 
doing  this  we  can  resolve  which  of  these  models  is  the  better 
for  the  domain.  By  continually  running  the  clustering  system, 
finding  distinct  but  good  models,  and  then  generating  data 
points  where  these  models'  predictions  differ  we  generate  data 
points  where  our  knowledge  of  the  domain  is  contradictory. 
To  determine  which  of  measure  of  predictability  (model 
description  or  predictability)  is  better  we  conducted 
experiments  on  the  following  problem.  Consider  a  population 
of  objects/entities  each  having  m  binary  attributes.  In  the 
population  there  exists  m  classes.  Class  /,  i  =  1  ..  m  can  be 
precisely  described  as  having  the  value  0  (false)  for  all 
attribute  except  the  ith  which  is  1  (true).  Table  1  provides  the 
precise  description  for  a  few  classes  for  the  m=10  situation. 


Attribute 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Class  1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Class  10 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

Table  1.  Precise  description  for  classes  in  m=10  situation. 

To  determine  the  proportion  of  each  class  in  the  population  we 
make  use  that  the  summation  of  the  first  r  integers  is  given  by: 


from  this  the  relative  proportion,  P^  /  =  /...  m  of  class  i  in  the 
population  is  given  by: 

By  using  the  MML  equations  described  in  [1]  we  can 
determine  the  approximate  number  of  objects  (data  points) 
required  to  find  the  true  model  if  the  objects  are  randomly 
sampled  from  the  population.  For  our  trial  set  of  objects  this 
number  was  184.  Below  this  amount  of  data  the  best  model  is 
to  place  all  objects  into  one  class,  indicating  that  the  data  from 
an  information  theoretic  view  is  random.  For  this  data  set  the 
encoding  of  the  model  and  data  for  the  true  model  and  one 
class  model  were  566.21  and  570.24  nits  respectively.  The 
difference  between  the  lengths  is  the  comparitive  difference  in 
likelihood.  Thus  the  true  model  is  approximately  times 
more  likely  than  the  one  class  model  for  the  given  data  set. 
We  conducted  trials  to  determine  how  successful  our  strategy 
of  focusing  experiments  on  where  the  models  predictions 
differ  is.  In  each  trial  the  clusterer  was  given  the  first  60,  80 
and  120  objects  of  our  data  set.  As  we  have  shown,  this  is 
insufficient  to  chose  the  true  model  over  the  one  class  model. 
For  120  objects  the  true  model  and  one  class  model  had 
encoding  length  of  415.56  and  368.32  nits  respectively  for  this 
reduced  data  set.  The  true  model  is  approximately  eA1  times 
less  likely  than  the  one  class  model  for  the  data.  The  class 
distributions  follow  in  table  2: 


Class 

1     2     3      4      5  6 

7      8      9  10 

Number 

Frequency 

5     1     7     13     7  13 

15     17     21  21 

Table  2.  Class  distributions  for  initial  120  objects 

Our  aim  is  to  generate  new  data  points  so  that  eventually  the 
true  model  is  found.  The  number  of  new  experiments  required 
to  converge  to  the  true  model  is  one  obvious  measure  of 
performance. 

5.  J  How  To  Generate  New  Experimental  Data 

The  process  of  running  the  clustering  system  with  a  given  set 
of  data  produces  a  number  of  theories  (taxonomies/models)  of 
the  data.  We  wish  to  generate  new  experimental  data  which 
can  be  used  in  further  applications  of  the  clustering  system  to 
better  understand  the  domain  and  find  the  true  model.  We 
focus  on  generating  new  data  where  the  predictions  of  the 
theories  are  different.  An  example  based  approach  is  used 
where  an  example  of  an  object  the  models'  predictions 
disagree  upon  is  used  as  input  into  an  experiment.  The 
experiment  takes  the  object  as  an  input  and  returns  similar 
objects.  For  additional  complexity  there  is  a  stochastic  aspect 
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to  the  experiments  which  results  in  the  chance  that  the 
experiment  will  return  the  wrong  result.  In  our  studies  this 
error  is  25%. 

The  examples  can  be  selected  by  re-arranging  a  P  contingency 
table  so  that  the  leading  diagonal  has  the  largest  counts.  The 
remaining  elements  represent  objects  which  are  predicted 
indifferently  for  these  two  models.  Completing  this  task  for  all 
possible  pairs  of  models  can  determine  those  objects  for  which 
the  model's  predictions  differ  the  greatest. 
Experiments  could  also  be  generated  by  prescription.  This 
would  involve  a  description  of  an  exemplar  object  for  which, 
if  it  were  to  exist,  the  current  stored  models  would  make 
contradictory  predictions  for.  This  exemplar  could  be 
constructed  from  where  cluster  descriptions  differ  the  most 
between  all  clusters  from  one  model  with  all  clusters  from 
another.  We  have  not  explored  this  option  as  yet. 
We  established  two  control  trials.  One  generated  new  objects 
by  sampling  them  from  the  population  (sampPop)  whilst 
another  generated  new  objects  from  each  class  in  equal 
proportion  (equProp). 

Table  3  illustrates  the  comparison  between  each  model  search 
and  experiment  generation  technique.  Both  search  techniques 
stored  the  five  best  models  which  had  the  shortest  message 
lengths  but  were  different  from  each  other.  The  techniques 
differed  in  the  notion  of  difference.  Search  technique  A  used 
the  predictive  difference  between  models;  B  the  description 
difference  between  models.  Two  experiment  generation 
techniques  were  tried:  technique  C  generates  new  objects  in 
batches  of  10  whilst  technique  D  generated  objects  in  batches 
0^20^  After  each  batch  was  generated,  the  clusterer  was  re-run 
and  the  process  repeated  until  the  true  model  was  discovered. 
The  control  trial  generated  objects  in  batches  of  5.  All  four  of 
our  variations  outperformed  the  two  control  approaches  by 
requiring  approximately  half  as  many  data  points  to  converge 
to  the  true  model. 


Search  Technique 

A 

B 

A 

B 

samp 
Pop 

equ 
Prop 

Experiment  Generation 
Technique 

C 

C 

D 

D 

New  objects  required  to 
find  true  model.  60  initial 
objects. 

70 

70 

80 

80 

170 

140 

Additional  objects  required 
to  converge  to  true  model. 
80  initial  objects. 

50 

60 

80 

60 

140 

120 

Additional  objects  required 
to  converge  to  true  model. 
120  initial  objects. 

40 

40 

60 

60 

100 

80 

Table  3.  Results  of  trials 

6.  Discussion  and  Future  Work 

We  shall  focus  our  discussion  on  the  situation  with  120  initial 
objects  using  search  technique  A  (model  difference  measured 


by  predictions)  and  experiment  generation  technique  C  (batch 
sizes  of  10).  The  system  behavior  is  summarized  in  table  4. 


1" 

2nd 

3rd 

4.b 

5* 

Trial 

Trial 

Trial 

Trial 

Trial 

True  Classes 

6 

8 

7 

9 

10 

Found  In  One  of 

the  Best  Models 

Class  experiments 

1 

3 

9 

2 

focus  on 

True  Model  Found 

No 

No 

No 

No 

Yes 

Table  4.  Summary  of  Behavior  For  120  Initial  Object  Case. 

The  ten  classes  in  the  true  model  are  not  justified  by  the  initial 
data.  After  the  first  trial  with  120  objects,  the  best  models,  in 
combination,  contained  the  correct  description  and  object 
assignments  for  six  of  these  classes.  The  models  most 
disagreed  upon  what  class  objects  in  the  class  1  should 
belonged  to,  more  objects  similar  to  this  class  were  requested. 
A  further  4  more  trials  occurred  before  the  true  model  was 
found.  Of  course  in  our  situation  we  know  what  the  true  model 
is.  It  was  interesting  that  the  number  of  true  classes  found  did 
not  increase  monotonically,  nor  that  the  class  the  models 
predictions  differed  most  on,  was  not  the  least  frequent. 
One  of  our  aims  was  to  determine  the  impact  of  searching  the 
model  space  to  find  the  best  but  most  different  models  with 
respect  to  description  and  predictions.  However  irregardless  of 
the  measure  of  difference  used,  similar  models  were  found. 
We  feel  this  is  due  to  the  simplicity  of  the  problem  and  most 
likely  this  will  not  occur  in  more  complex  domains. 
We  have  not  made  use  if  there  exists  any  relationship  between 
the  similarity  of  two  models  for  their  predictive  capability  and 
descriptions.  We  can  consider  five  cases: 


Relationship  Interpretation 
^•ab(P)  >  ^ab(D)    The  models  predictions  are  more  similar 

than  their  descriptions. 
^•ab(P)  »  ^ab(D)    The  models  predictions  are  significantly 

more  similar  than  their  descriptions. 
^-ab(P)  <  ^ab(D)    The   models   descriptions   are  more 

similar  than  their  predictions. 
^ab(P)  «  ^ab(D)   The      models      descriptions  are 

significantly  more  similar  than  their 

predictions. 

^ab(P)  *  ^ab(D)  The  similarity  between  the  models  with 
respect  to  their  descriptions  and 
predictions  are  fairly  equivalent. 

Table  5.  The  relationship  between  measures  of  similarity 
between  a  models  description  and  its  predictions. 

We  can  diagramatically  represent  each  situation  by 
considering  a  population  of  things  which  only  has  one 
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attribute,  which  we  believe  to  be  normally  distributed.  Each 
model  has  only  one  class  whose  description  is  the  mean  and 
standard  deviation  for  that  particular  attribute.  Figure  1 
illustrate  the  situations  for  Xab(P)  >  ^(D).  The  circles 
represented  data  points.  In  this  situation  the  models  make 
similar  predictions  for  the  current  data  points,  but  they  are 
evidently  different.  Xab(P)  is  larger  than  Xab(D)  because  the 
current  data  points  do  not  occur  in  areas  where  the  models 
predictions  would  differ.  Using  and  contrasting  both  measures 
could  be  of  benefit. 
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We  intend  to  explore  the  annealing  literature  to  see  if  any 
insight  can  be  provided  to  bias  the  search  technique  to  better 
explore  the  model  space.  The  lambda  measures  of  association 
used  in  the  contingency  tables  whilst  adequate  are  problematic 
for  skewed  distributions.  We  intend  to  explore  measuring  the 
information  content  of  contingency  tables  to  obtain  better 
measures  of  predictability.  As  stated  earlier  we  believe  that 
using  multiple  models  can  address  other  questions  in 
autonomous  learning  such  as  when  to  change  the  model  space 
which  we  plan  to  explore.  Our  clustering  system  can  change 
the  model  space  by  taking  the  Cartesian  product  of  attributes 
and  changing  the  probability  distribution  (discrete  or  normal) 
assumed  for  each  attribute. 

7.  Conclusion 

We  have  developed  a  clustering  system  which  can  search  the 
model  space  for  good  but  distinct  models.  The  difference 
between  models  can  be  measured  regarding  their  predictions 
or  descriptions.  Using  these  models  can  provide  insight  into 
how  to  address  questions  of  autonomous  learning  systems  of 
which,  we  have  focused  on  the  next  set  of  experiments  to 
conduct  to  better  understand  the  domain.  We  have  applied 
this  clustering  system  to  an  artificial  problem  where  the  initial 
set  of  data  is  inadequate  to  find  the  true  model.  We  explore 
the  idea  of  generating  new  objects  where  the  models 
predictions  differ.  We  have  shown  that  this  approach  results 
in  finding  the  true  model  by  generating  only  half  as  many 
additional  objects  than  by  using  blind  techniques  for  our 
problem. 
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Abstract 

Teleoperation  environments  consist  of  viewing  and 
manipulation  controls  to  interact  with  the  remote 
environment.  Camera  control  distracts  the  operator  from 
the  manipulation  task,  leading  to  reduced  efficiency. 
Releasing  the  operator  from  the  camera  control  overhead 
can  be  achieved  with  the  use  of  a  second  operator  or  by 
employing  an  automated  viewing  system.  In  this  paper  we 
present  a  model  for  automated  camera  positioning  that 
can  track  a  task  and  index  a  set  of  viewing  actions,  visual 
acts,  that  in  turn  control  camera  placement.  We  illustrate 
the  use  of  "multi-agent"  architectures  to  create  viable 
experimental  environments  for  studying  automated  camera 
control  and  we  present  a  multi-agent  model  we  are 
currently  exploring  for  creating  an  automated  camera 
positioning  support  for  operators  in  teleoperation 
scenarios. 

1.  Introduction  and  Background 

Teleoperation  environments  consist  of  viewing  and 
manipulation  controls  to  interact  with  the  remote  cameras 
and  manipulators  respectively.  Typically  teleoperation 
tasks  can  be  performed  in  two  ways.  In  the  first  case,  a  sin- 
gle operator  has  sole  control.  However,  the  complexity  of 
the  controls  means  that  typically  the  operator  will  control 
either  the  robot  manipulators  or  the  cameras,  but  not  be 
able  to  control  both  at  the  same  time.  This  means  that  the 
task  of  controlling  the  cameras  tends  to  distract  the  opera- 
tor from  the  manipulation  task,  and  leads  to  significantly 
reduced  efficiency  [11]. 

In  the  second  case,  the  teleoperation  task  is  performed 
by  two  operators.  The  first  operator  controls  the  robot 
manipulators,  and  the  second  operator  controls  the  cam- 
eras in  an  attempt  to  provide  the  first  operator  with  the  vis- 
ual information  required  to  perform  the  task.  Although  this 
is  an  improvement  for  the  first  operator,  extensive  operator 
training  may  be  required.  One  of  the  key  challenges  in  this 
mode  of  working  is  to  structure  the  task  such  that  the 
intentions  of  the  first  operator  are  signalled  clearly  to  the 
second.  This  is  normally  achieved  using  a  pre-compiled 
task  script  and  communication  between  the  two  operators 
to  register  their  actions  within  the  real-time  setting. 

The  aim  of  automating  camera  control  is  to  remove  the 
need  for  the  second  operator  without  incurring  the  burden 


of  camera  control  on  the  first  operator.  A  secondary  aim  is 
to  reduce  the  complexity  of  the  operator  interface,  and 
wherever  possible,  to  offer  additional  assistance  to  the 
operator  by  complementing  the  sensory  data  from  the 
remote  environment  [11]. 

In  a  previous  paper  we  proposed  to  use  deliberative  task 
models  and  a  reactive  architecture  to  achieve  these  goals 
[9].  The  deliberative  model  captured  the  notion  of  a  script 
and  the  reactive  model  addressed  the  real-time  environ- 
ment. The  automated  viewing  system,  in  this  context,  rep- 
resents an  "autonomous"  agent  acting  cooperatively  with 
the  operator.  It  takes  away  the  responsibility  for  visual 
control,  leaving  the  operator  to  focus  attention  on  the 
manipulation  task.  The  success  of  the  cooperation 
depends,  however,  on  signalling  the  operator's  intentions 
to  the  viewing  system.  In  particular,  it  is  essential  that  the 
automated  viewing  system  remain  stable  and  provide  a 
consistent  quality  of  service  to  the  operator.  Our  aim,  in 
particular,  is  for  the  viewing  system  to  be  "responsive"  to 
the  operator  so  that  the  operator  still  retains  the  feeling  of 
being  in  control,  while  remaining  "stable." 

The  actions  of  the  operator  and  their  concomitant 
impact  on  the  remote  environment,  are  the  prime  source  of 
signals  for  an  automated  viewing  system.  However,  they 
may  not  reveal  themselves  without  some  form  of  coercion 
during  the  design  of  the  operator  interfaces  and  the  task 
environment.  Therefore,  autonomous  systems  need,  in 
many  cases,  to  be  situated  within  an  operational  context. 
Within  this  context  they  are  expected  to  behave  intelli- 
gently. In  a  wider  context,  however,  they  may  not  display 
the  same  level  of  intelligent  action,  and  we  should  not  nec- 
essarily expect  them  to  do  so. 

We  should,  however,  expect  the  evolution  of  the  auton- 
omous agent  to  follow  a  path  of  generalisation  as  we 
develop  techniques  which  allow  us  to  relax  the  coercion. 
We  might  actually  say  that  this  evolution  may  progress  in 
tandem  with  our  ability  to  discriminate  signals.  The  hybrid 
architecture  we  have  proposed  lends  itself  to  a  man- 
machine  interface  model  in  which  manipulation  actions 
can  be  enhanced  to  provide  clear  signals  to  the  viewing 
system,  while  retaining  a  manipulation  focus.  We  have  to 
be  clear,  however,  that  while  we  are  granting  autonomy  to 
the  viewing  system,  we  are  at  the  same  time  creating  a 
symbiotic  link  which  ties  the  autonomous  agent  to  a  spe- 
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cific  operational  context. 

The  remainder  of  this  report  is  based  on  work  being 
done  in  the  Active  Robotics  Laboratory  at  the  University 
of  Reading  to  demonstrate  computer  aided  camera  place- 
ment techniques  for  teleoperation.  In  the  next  section  we 
outline  a  model  for  automating  camera  placement.  In  sec- 
tion 3  we  present  some  ideas  for  the  simulation  and  imple- 
mentation of  this  model  in  a  real  robotics  laboratory,  and  in 
section  4  we  present  the  specific  model  we  are  currently 
exploring.  In  section  5  we  draw  our  conclusions. 

2.  Intelligent  Automated  Camera  Placement 

Our  model  for  automated  camera  placement  makes  use 
of  a  high-level  task  based  schedule.  The  components  of  the 
schedule  are  used  simply  to  provide  the  context  of  a  task 
operation.  The  cameras  are  placed  under  the  control  of  a 
reactive  decision  making  system  which  attempts  to  deter- 
mine what  the  operator  is  "looking  at"  and  provide  suitable 
views. 

The  task  model  provided  is  generated  from  prior  train- 
ing runs.  During  these  training  runs,  a  sequence  of  high 
level  instructions  such  as  "pick  up  bolting  tool",  or  "place 
screwdriver  head  in  screw"  can  be  determined.  These 
instructions,  are  then  classified  as  generic  operations,  such 
as  "peg-in-hole",  which  are  provided  to  the  viewing  sys- 
tem in  the  form  of  an  operation-based  schedule. 

Visual  Goals 

Our  proposal  is  based  on  decomposing  task  operations 
into  simple  atomic  goals  [1][7][8].  These  goals,  which 
match  the  model  of  human  perception,  are  computed  as  if 
the  operator  were  performing  an  assembly  task.  The  notion 
is  to  use  assembly  planning  techniques  to  compute  a  set  of 
atomic  geometric,  kinematic  or  topological  constraints  that 
must  be  satisfied  in  order  to  perform  the  operation  [4]. 

An  example  is  the  "peg-in-hole"  operation  (Fig.  [1]).  To 
perform  this  operation,  the  operator  must  first  satisfy  three 
basic  geometric  constraints  before  inserting  the  peg  into 
the  hole,  namely: 

•  Axes  of  peg  and  hole  must  be  parallel. 

•  Axes  of  peg  and  hole  must  be  coincident. 

•  Lower  surface  of  peg  must  be  flush  with  the  upper  sur- 
face of  the  hole. 

Each  constraint  is  considered  a  visual  goal,  and  each  visual 
goal  has  associated  with  it  a  visual  act  to  satisfy  that  goal. 
Since  the  visual  goals  are  independent  of  each  other,  the 
visual  acts  can  be  performed  sequentially  in  any  order,  or 
indeed  in  parallel.  Constraints  on  the  sequencing  of  the 
visual  goals  are  imposed  by  the  task  setting.  For  example, 
the  parallel  and  the  coincident  constraints  have  to  be  satis- 
fied before  the  insertion  of  the  peg  in  the  hole  can  be  com- 
pleted. 

A  "visualisation  model"  for  a  task  can  therefore  be 


a.  parallel  axes       b.  coincident  axes        c.  insertion 


Figure  1 :  The  peg-in-hole  operation 
defined  as  follows:  A  task  can  be  segmented  into  a  set  of 
operations  and  a  set  of  constraints  on  the  sequencing  of 
these  operations.  Associated  with  each  operation  is  a  set  of 
visual  goals  and,  implicitly,  a  set  of  constraints  imposed  by 
the  task  setting. 

Visual  Acts 

As  mentioned,  a  visual  act  is  the  viewing  action 
required  to  satisfy  a  visual  goal.  Specifically,  since  it  is  the 
operator  that  performs  the  manipulative  action,  the  auto- 
mated camera  placement  system  must  use  the  information 
in  the  visual  act  to  provide  a  set  of  viewing  parameters  for 
the  camera  system. 

The  camera  placement  system  must  track  the  operator 
continually.  In  particular  it  must  determine: 

•  Which  specific  action,  or  actions,  the  operator  is  cur- 
rently trying  to  perform  in  pursuit  of  the  task. 

•  As  the  workspace  evolves,  how  best  to  set  the  viewing 
parameters  to  satisfy  each  visual  act. 

Ideally,  there  would  be  no  explicit  communication  of 
the  operators  intentions  to  the  automated  camera  position- 
ing system.  However,  this  is  not  realistic  at  this  time  since 
even  in  the  dual-operator  scenario,  where  the  second  oper- 
ator is  also  an  "autonomous"  agent,  explicit  communica- 
tion of  intentions  is  required  to  register  the  manipulation 
and  viewing  actions.  Our  aim  is  to  at  least  retain  the  major- 
ity, if  not  all,  of  the  operator's  attention  focused  on  the 
manipulation  task.  We  are,  therefore,  willing  to  consider 
additional  modules  or  actions  within  the  manipulation 
interface  which  on  the  one  hand  can  be  motivated  by  the 
manipulation  task,  but  which  provides  signals  that  can  be 
used  to  identify  the  operator's  intentions. 

For  example,  we  might  introduce  an  additional  module 
which  allows  the  operator  to  designate  task  operations  at 
various  levels  of  abstraction.  The  automated  camera  posi- 
tioning system  can  then  takes  its  cue  from  this  interface 
module  to  index  the  associated  set  of  visual  goals. 

Within  this  top-down  context  we  then  propose  a  reac- 
tive multi-agent  model  [2]  [12]  to  select  and  deliver  task- 
relevant  information  to  the  operator.  There  are  actually  a 
number  of  ways  in  which  we  can  deploy  multi-agent  mod- 
els apparently  to  good  effect.  In  the  following  section  we 
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outline  a  number  of  these  approaches  and  we  describe  the 
multi-agent  environment  we  are  currently  implementing  to 
support  these. 

3.  Architecture 

Our  choice  of  architecture  is  not  just  an  implementation 
issue.  With  the  way  the  problem  is  modelled,  outlined 
above,  it  is  almost  impossible  to  divorce  the  architecture  of 
implementation  from  the  problem.  In  this  section  we  hope 
to  demonstrate  how  our  architecture  permits  us  to  imple- 
ment a  solution  and  actually  assists  us  with  that  solution. 

3.1  Resource  Contention 

The  resources  to  support  remote  viewing  comprise  not 
just  of  cameras,  but  also  video  displays,  network  band- 
width, etc.  Four  basic  categories  can  be  identified 

•  Operator  console  devices,  including  six  degree  of  free- 
dom input  devices,  and  graphics  user  interfaces. 

•  Operator  output  devices,  including  video  displays,  and 
other  devices  to  present  data  to  the  operator. 

•  Remote  environment  sensor  devices,  including  cameras, 
sonar,  and  other  sensors. 

•  System  resources,  including  computation  and  communi- 
cation devices. 

These  resources  are  generally  limited.  Thus,  the  visual 
acts  will  have  to  compete  with  each  other  for  control  of  the 
resources.  Some  form  of  conflict  resolution  needs  to  be 
adopted  if  contention  for  resources  arises.  This  means  that 
it  is  necessary  to  identify  how  important  it  is  to  get  a  par- 
ticular visual  goal  satisfied. 

3.2  Multi-Agent  Architecture 

The  problem  of  resolving  resource  contention  and  pro- 
viding views  automatically  to  the  operator  cannot  be  real- 
ised easily  using  either  of  the  extremes  of  current  agent 
architecture  design.  In  classical  planning  systems,  the 
planner  generates  the  plan  of  achieving  a  goal  in  very  fine 
detail  before  the  plan  can  be  executed.  These  systems  rely 
heavily  on  the  assumption  that  the  environment  is  static, 
which  clearly  it  is  not,  since  the  operator,  who  is  a  part  of 
the  viewing  systems  environment,  can  perform  the  task  in 
any  number  of  ways. 

On  the  other  hand,  purely  behaviour  based  reactive 
architectures  [2]  rely  on  the  close  coupling  of  sensors  to 
effectors.  With  these  systems  goals  are  always  represented 
implicitly  according  to  a  fixed,  pre-compiled  ranking 
scheme.  They  also  assume  that  the  requisite  activity  of  the 
system  can  be  implied  directly  from  the  current  state  of  the 
environment  [3].  Again,  this  architecture  is  not  entirely 
suitable  since  visual  acts  are  inherently  parallel,  and  the 
operator  is  free  to  execute  the  task  in  whatever  order  he  or 
she  decides. 

The  system,  therefore,  must  maintain  a  concept  of  the 


global  task,  but  still  be  responsive  to  the  operators  choice. 
Our  solution  is  to  adopt  a  hybrid  multi-agent  architecture 
[5]  [12],  with  each  agent  executing  in  parallel  to  follow  the 
operators  progress  as  the  operation  is  being  performed.  We 
use  this  method  in  order  to  allow  the  simultaneous  tracking 
of  each  visual  act  by  the  viewing  system.  Although  each 
agent  embodies  a  single  goal,  they  must  compete  for  the 
necessary  resources  to  achieve  their  goal,  and  any  conflict 
must  be  resolved  by  negotiation.  To  do  this,  each  agent  is 
required  to  declare  a  qualitative  measure  of  their  contribu- 
tion to  the  system.  This  requirement  forces  the  system  to 
reflect  on  the  environment  and  react  to  the  operators 
actions,  while  still  maintaining  a  single  global  task  con- 
straint. 

Initially  the  negotiation  process  may  be  as  simple  as 
choosing  to  allocate  resources  to  agents  that  show  the  most 
promise  of  delivering  a  goal.  However,  to  prevent  instabil- 
ity, where  agents  oscillate  between  goals,  or  resources  are 
switched  continuously  between  agents,  some  additional 
rules  may  have  to  be  applied.  Again,  simple  rules,  such  as 
retaining  the  status  quo  until  a  particular  threshold  is 
reached,  may  suffice. 

In  order  to  map  the  problem  onto  a  multi-agent  architec- 
ture we  have  identified  three  approaches. 

Resource  Centred 

Using  this  approach,  each  "resource  agent"  is  assigned 
sole  control  over  a  single  resource  (camera,  sonar  sensor). 
At  the  beginning  of  the  task,  the  operation  is  divided  into 
its  component  visual  goals  (as  described  above),  and  each 
visual  goal  is  injected  into  a  central  pool. 

The  goals  in  this  pool  are  offered  to  the  resource  agents 
for  "adoption".  The  agents  negotiate  with  each  other  to 
decide  which  agent  should  adopt  which  visual  goal.  Once 
an  agent  has  adopted  a  goal,  the  role  of  that  agent  is  to 
manage  its  resource  in  order  to  provide  the  information 
necessary  for  the  visual  act. 

In  order  to  cope  with  the  evolving  nature  of  the  environ- 
ment, the  agents  are  required  to  continue  their  process  of 
negotiation.  The  effect  is  that  each  resource  is  occupied 
with  providing  information  to  the  operator  that  most 
closely  matches  its  situation  in  the  environment. 

Requirements  Centred 

Since  the  task  is  decomposed  into  its  individual  visual 
goals,  this  approach  models  each  goal  as  an  agent.  These 
"requirements  agents"  are  solely  responsible  for  their  vis- 
ual goal  being  serviced  by  the  available  resources  (Fig. 
[2]). 

Each  agent  negotiates  with  the  others  for  control  of  the 
necessary  resources  to  achieve  its  goal.  As  with  the 
resource  centred  approach,  the  agents  continue  to  monitor 
the  progress  of  the  operator  and  renegotiate  for  control  of 
resources  as  the  operator  progresses  through  the  task. 
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Figure  2:  Resource  agents  for  peg-in-hole  operation 

Hybrid  Approach 

In  the  "peg-in-hole"  example  used  above,  one  of  our 
three  basic  geometric  constraints  was  given  as 

Axes  of  peg  and  hole  must  be  coincident. 
To  achieve  this,  the  operator  may  choose  to  divide  this 
goal  into  yet  simpler  tasks 

X-axis  of  peg  and  hole  must  be  coincident. 

Y-axis  of  peg  and  hole  must  be  coincident. 
The  operators  choice  on  how  to  divide  the  operation  fur- 
ther will  depend  on  the  available  cameras  in  the  environ- 
ment and  on  any  specific  kinematic  constraints  (for 
example  a  camera  may  be  constrained  to  pan/tilt  control 
only). 

Using  the  hybrid  approach  we  consider  the  problem  in 
two  parts.  The  first  stage  is  similar  to  the  resource  centred 
approach  where  each  resource  is  modelled  by  an  agent. 
The  visual  goals  are  presented  to  the  agents  which  use 
knowledge  of  the  environment  and  of  their  respective  cam- 
eras, to  propose  further  subdivisions  of  the  goals.  All  the 
new  goals  are  then  injected  into  the  central  pool  of  visual 
goals. 

In  the  second  stage,  the  visual  goals  are  themselves 
modelled  as  agents.  These  agents  are  now  responsible  for 
negotiating  with  each  other  to  gain  resources  to  achieve 
their  particular  visual  act. 

The  advantage  of  this  approach  is  to  increase  the  sys- 
tems flexibility  and  allow  it  to  cope  with  particular  hard- 
ware constraints,  or  perhaps,  if  there  is  a  surplus  of 
resources,  allows  the  system  to  maximize  the  benefit 
gained  from  these  resources. 

3.3  A  Multi-Agent  Implementation 

Our  system  makes  very  few  constraints  on  the  nature, 
implementation,  or  function  of  an  agent.  Agents  can  com- 
municate with  each  other,  enter  into  negotiations,  issue 
instructions  to  the  remote  environment  control  system, 
apply  for  a  sensor  "feed"  from  any  part  of  the  remote  envi- 
ronment or  from  each  other,  or  perform  any  other  action 


virtually  without  constraint. 

However,  an  agent  must  embody  a  single  goal,  process 
or  strategy.  Agents  must  maintain  their  own  local  informa- 
tion necessary  to  perform  their  task.  We  may  relax  this  rule 
for  performance  reasons,  although  only  where  this  is  trans- 
parent to  the  function  of  the  agent. 

With  the  exception  of  agents  with  control  over  a  partic- 
ular piece  of  hardware,  agents  have  no  physical  properties. 
The  "resource  agents"  are  given  control  over  only  one  spe- 
cific piece  of  hardware,  such  as  a  camera  or  sonar  device. 
In  order  for  the  system  to  perform  at  all,  the  agents  must 
cooperate. 

Our  system  also  makes  very  few  constraints  on  the 
nature  of  inter-agent  communication,  or  agent  to  remote 
environment  communication.  In  addition  to  traditional 
explicit  communication  channels,  we  expect  to  use 
implicit  communication  channels.  For  example,  the  basis 
of  the  approaches  outlined  above  were  the  visual  goal  or 
resource  "pools".  These  pools  can  be  seen  as  implicit  com- 
munication channels. 

Our  implementation  of  agents  as  individual  processes, 
perhaps  distributed  across  several  machines,  allows  us  to 
develop  solutions  that  are  truly  flexible.  Agents  are  thus 
forced  to  use  the  inter-process  and  network  communica- 
tions facilities  available.  This  approach  is  necessary  in 
order  to  simulate  the  dual  operator  scenario,  however,  it 
does  also  afford  several  additional  benefits. 

Clearly,  two  implementations  of  an  agent,  so  long  as 
they  maintain  the  same  interface,  are  interchangeable.  We 
make  use  of  this  feature  to  implement  some  of  the  hard- 
ware resources  in  simulation.  This  allows  us  to  develop  the 
computational  agents,  test  them  in  simulation,  and  then 
simply  plug  in  the  real  hardware  control  agents  in  place  of 
the  simulation  to  test  our  algorithms  with  the  real  robotics 
hardware.  Figure  [3]  shows  how  each  module  in  the  sys- 
tem relates  to  each  other.  The  modules  (such  as  the  robot 
simulation  module)  are  implemented  as  an  agent  or  set  of 
agents,  and  communicate  with  each  other  using  standard 
point  to  point,  and/or  shared  memory  communication. 

Our  initial  experimental  aims  have  been  to  measure  rel- 
ative timings  for  simple  teleoperation  tasks  using  a  single 
and  dual  operator  system.  We  have  begun  with  a  simulated 
environment  in  order  to  constrain  the  problem  and  to 
enforce  the  assumptions  that  we  have  made  concerning  the 
required  behaviour  of  the  camera  control  system  and  the 
accuracy  of  the  workspace  model.  This  simulation  envi- 
ronment is  built  using  a  multi-agent  approach  as  outlined 
above. 

The  Active  Robotics  Laboratory  at  the  University  of 
Reading  will  form  the  basis  of  experiments  using  real 
robotic  manipulators  and  cameras.  The  Netrolab  environ- 
ment [10]  provides  a  set  of  networked  robotics  resources 
which  can  be  configured  to  support  diverse  applications. 
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Figure  3:  Multi-Agent  Modules 


The  resources  include  stationary  and  mobile  robots,  labo- 
ratory viewing  cameras  and  sonar  sensors.  The  networked 
aspect  of  this  laboratory  suits  the  multi-agent  architecture 
that  we  have  proposed,  and  makes  it  particularly  easy  to 
plug  the  computational  modules  of  the  simulation  directly 
into  the  laboratory  [6]. 

4.  An  Autonomous  System  for  Camera  Control 

The  model  we  are  investigating  for  automated  camera 
control  is  based  on  the  hypothesis  that  knowledge  of: 

•  the  task  setting, 

•  the  task  operation  being  performed, 

•  the  set  of  visual  acts  associated  with  task  operations, 
identified  above, 

•  the  "information"  content  of  the  task  environment, 

will  provide  sufficient  constraints  on  the  operator's  inten- 
tions to  determine  his  or  her  visual  goal(s)  at  any  instant 
during  the  performance  of  the  task. 

Following  the  model  presented  above,  we  assume  that 


the  task  operation  is  designated  by  the  operator  at  the 
manipulator  interfaces.  The  task  setting  can  be  assumed  to 
be  instantiated  by  the  operator  prior  to  task  execution  and 
updated  in  tandem  with  the  operator's  actions.  We  assume, 
that  is,  that  the  task  environment  is  known.  The  set  of  vis- 
ual goals  is  assumed  to  be  identified  a  priori  through  auto- 
mated task  planning  or  manual  processing  of  task  logs. 

We  also  exploit  the  state  of  the  manipulator  in  the 
remote  environment,  and  the  control  actions  of  the  opera- 
tor to  change  this  state,  in  order  to  determine  the  "informa- 
tion content"  of  the  task  environment.  Specifically,  we  aim 
to  track  the  level  and  the  rate  of  change  of  the  information 
associated  with  each  visual  goal. 

Our  aim  is  to  develop  a  multi-agent  model  which  will 
exploit  this  information  to  identify  the  current  visual  goal. 
In  this  model  an  agent  will  represent  a  visual  goal,  and  will 
use  the  above  information  to  "argue"  its  case.  The  agents 
which  dominate  will  determine  the  visual  acts  presented  to 
the  operator  and  their  relative  configuration  in  the  display 
interface.  A  number  of  methods  for  combining  the  infor- 
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mation  will  be  explored,  including  rule-based,  fuzzy  logic 
and  neural  network  methods. 

We  intend  to  conduct  two  sets  of  experiments.  The  first, 
which  we  designate  the  "passive  operator"  scenario,  will 
allow  us  to  provide  a  base  line  model  with  which  to  com- 
pare and  contrast  the  impact  of  the  automated  viewing  sys- 
tem and  allow  us  to  assemble  data  with  which  to  prime  the 
latter.  In  the  passive  operator  experiments,  subjects  will  be 
"led"  through  a  sequence  of  task  operations  according  to  a 
pre-defined  script  based  on  the  visualisation  model.  Quan- 
titative performance  data,  and  qualitative  observations  and 
reports,  will  be  used  to  assess  the  operator's  reaction  to 
this  mode  of  working.  We  will  be  particularly  interested  to 
determine  whether  the  operator  feels  "in  control". 

The  second  set  of  experiments  will  evaluate  the  multi- 
agent  model  within  an  "active"  operator  scenario;  the  oper- 
ator is  free  to  direct  the  progression  of  the  task.  We  are 
keen  to  observe  the  operator's  performance  within  this  sce- 
nario and  his  or  her  awareness  of  the  viewing  system.  We 
are  particularly  interested  in  determining  the  operator's 
willingness  to  adapt  to  the  viewing  system  as  well  as  the 
required  adaptation  of  the  viewing  system  to  the  operator. 

Another  area  where  we  will  pay  particular  attention  is 
the  situation  where  there  is  no  clear  winner  among  the 
agents.  We  expect  that  an  inertial  factor  will  be  need  to  be 
introduced  in  order  to  smooth  the  transition  between  visual 
acts,  and  therefore  maintaining  the  "stability"  mentioned 
earlier.  These  experiments  are  now  being  developed  and 
will  be  reported  in  subsequent  papers. 

5.  Summary  and  Conclusions 

In  this  paper  we  have  presented  a  model  for  automating 
viewing  for  teleoperation.  We  have  discussed  several  tech- 
niques for  realising  this  model  using  a  multi-agent  archi- 
tecture, and  have  shown  how  this  architecture  can  promote 
a  flexible  solution  without  incurring  additional  cost. 

During  teleoperation  our  system  will  track  the  progress 
of  the  operator  and  support  him  or  her  by  providing  views 
based  on  the  visual  acts  model.  This  model  is  designed 
with  stability  as  an  important  goal,  in  order  that  the  envi- 
ronment remain  firmly  under  the  control  of  the  operator. 
Man-machine  interface  design  is  a  key  tool  in  providing  a 
symbiosis  between  the  operator  and  the  viewing  system 
which  maintains  stability  in  the  latter. 

Using  the  peg-in-hole  example,  we  are  currently  con- 
ducting tests  on  both  single  and  dual  operator  controlled 
simulated  teleoperation  environments,  and  have  identified 
several  key  aspects  that  warrant  particular  investigation. 
Our  laboratory,  which  is  well  equipped  for  teleoperation 
applications,  will  allow  us  to  expand  our  experiments  to 
use  real  robotic  manipulators  and  camera  viewing  systems. 
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ABSTRACT 

In  this  paper  we  briefly  describe  our  approach  to  recognition  and 
learning  for  autonomous  mobile  systems.  Our  system  architecture 
is  a  multi-skill-oriented  architecture.  Each  skill  in  our  architecture 
comprises  components  for  reflexive  behavior  and  components  for 
complex  behavior.  Complex  behavior  is  represented  by  our  Ap- 
plied Memory-Based  Reasoning  approach  (A-MBR).  This  is  a 
scalable  algorithm  for  fast  recognition  using  a  large  extendable  da- 
tabase containing  sparsely  coded  knowledge.  Application  of  A- 
MBR  to  robot  navigation  is  explained  and  learning  principles  are 
discussed.  We  are  working  on  the  extension  to  3D  recognition.  We 
integrate  principles  from  geometric  hashing  [2]  and  memory  based 
reasoning  [4]  approaches  known  in  the  literature. 

KEYWORDS:  robot  navigation,  system  architecture, 
learning,  memory-based  reasoning,  geometric  hashing. 


faster  and  more  precise  model  matching  and  simplified 
model  learning. 

Section  2  describes  the  architecture  as  well  as  our  frame- 
work for  A-MBR.  Section  3  gives  a  short  overview  of  Rec- 
ognition by  Geometric  Hashing.  Section  4  is  dedicated  to 
our  robot  application.  Subsequent  sections  describe  recog- 
nition/learning principles  and  experimental  results. 

2.  SYSTEM  ARCHITECTURE 

A  schematic  of  our  multi-skill-oriented  architecture  for 
learning  systems  is  shown  in  Figure  1  .  It  is  similar  to  the 
well-known  reference  architecture  [1].  Each  skill  frame 
contains  the  functional  blocks  planning,  reflexive  behavior 
(RBF-like  NN)  and  complex  behavior  (based  on  models). 


1.  INTRODUCTION 

Autonomous  systems  such  as  robots  are  used  for  transport 
and  service  tasks  in  industry,  manufacturing  and  space  mis- 
sions, where  they  have  to  solve  various  complex  problems 
in  the  future.  It  will  become  necessary  to  recognize  the  real 
world,  to  handle  large  data  sets,  to  learn  unknown  objects, 
environments,  rules  and  behavior.  A  mobile  system  has  to 
behave  safely  and  reliably  and  to  communicate  at  least  on  an 
interpretable  level  to  make  its  behavior,  plans  and  intentions 
transparent  to  cooperating  systems  and  humans.  To  improve 
acceptance,  the  behavior  should  be  safe  and  transparent  to 
the  user.  The  system  should  be  able  to  react  to  unexpected 
events  reliably  and  autonomously.  Intelligent  systems  such 
as  autonomous  mobile  robots  need  models  for  intelligent  be- 
havior besides  the  reflexive  behavior.  Applications  are  intel- 
ligent sensor-based  control  together  with  model-based  pre- 
dictive control  for  motion  and  manipulation  tasks.  For 
intelligent  control  we  need  improved  fast  prediction  in  the 
real-time  range  using  realistic,  large  model  bases.  We  need 
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Figure  1  Schematic  of  system  architecture 
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In  Figure  1  some  functional  blocks  are  labeled  by  the  capi- 
tal letter  L  which  indicates  that  their  knowledge  representa- 
tions are  potentially  open  for  learning.  It  would  not  be  help- 
ful to  apply  learning  procedures  to  all  knowledge 
representations  in  parallel  but  flexible  access  to  all  knowl- 
edge representations  offers  high  potential  for  new  learning 
principles.  In  this  paper,  we  describe  some  aspects  of  our 
new  Applied  Memory-Based  Reasoning  (A-MBR)  ap- 
proach. This  is  our  framework  for  handling  and  representing 
complex  knowledge.  A-MBR  results  in  more  intelligent  be- 
havior. It  contains  numerical  and  symbolic  data  of  objects, 
relations,  and  rules.  It  is  a  large  extendable  database  of 
sparsely  coded  knowledge.  We  use  principles  of  the 
Memory-Based-Reasoning  approach  for  handling  knowl- 
edge such  as  relevance  feedback.  Data  sets  are  derived  from 
world  models  (e.g.  CAD  data)  and  from  sensor  data.  Incom- 
plete knowledge  can  be  used.  A-MBR  provides  symbolic 
and  numerical  entries  into  the  database,  which  means  at  least 
symbolic  input  and  symbolic  output  on  the  communication 
level  and  numerical  input  and  numerical  output  on  the  con- 
trol level  (see  Figure  2  ).  The  semantic  layer  is  connected 
to  the  numerical  data  layer  of  our  object  model  database 
even  if  planning  models  or  planning  maps  are  temporarily 
generated.  In  addition,  the  object  database  contains  attrib- 
utes such  as  docking  points,  grasping  points  and  relations, 
rules,  temporal  logic,  etc. 
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Figure  2  Schematic  of  A-MBR  knowledge  representation 

Operator  commands,  messages  from  guidance  systems,  or 
messages  from  cooperating  autonomous  systems  or  from 
higher  control  levels,  can  be  communicated  to  the  symbolic 
level  and  can  be  interpreted  by  the  planner.  Cooperative 
planning  is  possible  by  communication  on  the  common 
symbolic  level  and  mapping  into  the  local  planner  of  each 
autonomous  system,  enabling  consistency  of  a  common 
view.  Thus,  recognition  results,  output  messages  and  fault 
messages  can  be  interpreted  on  the  symbolic  level  and  can 
be  output  via  MMI  to  an  operator  or  via  messages  to  the 
guidance/planning  hierarchy  for  continuing  control  or  for 
fault  treatment  on  a  higher  control  level  or  for  changing 
cooperation  among  autonomous  systems.  Unpredictable  sit- 
uations can  be  explained  on  the  symbolic  level  after  recogni- 


tion. If  the  system  learns,  it  is  able  to  generalize  learned  in- 
formation and  transform  it  to  the  symbolic  level.  It  is  an  open 
approach  for  complex  task  planning  and  execution.  This  in- 
cludes task  planning  for  different  actions,  skills  for  manipu- 
lation, and  skills  for  motion. 

3.  GEOMETRIC  HASHING 

In  our  A-MBR  framework  Geometric  Hashing  as  promoted 
by  [3]  is  the  preferred  method  for  organizing  the  search  for 
matches  in  the  model-based  vision  system.  It  provides  the 
search  engine  portion  of  the  object  recognition  system. 

A  model  or  object  is  uniquely  identifiable  via  its  features. 
Provided  that  there  exists  an  edge  representation  for  the  ob- 
ject, edge  middle  points  can  be  taken  as  so-called  point  fea- 
tures. A  feature  may  or  may  not  possess  an  attached  attribute 
list  containing  items  such  as  edge  orientation,  length,  or  col- 
or. Edge  representation  and  feature  extraction  form  crucial 
inputs  to  the  geometric  hashing  algorithm.  With  objects 
characterized  through  point  features,  the  object  recognition 
problem  is  equivalent  to  the  problem  of  recognizing  patterns 
of  point  features.  An  object  may,  however,  have  many  fea- 
tures. To  keep  the  pattern-matching  task  of  reasonable  com- 
plexity, the  novel  idea  suggested  in  this  paper  -  as  opposed 
to  the  approach  pursued  by  [3]  -  is  to  recognize  an  object 
on  the  basis  of  portions  of  an  object.  A  portion  is  a  fragment 
of  an  object  containing  only  a  bounded  number  of  edges 
from  a  local  neighborhood.  The  total  number  of  different 
portions  per  object  may  vary  but  is  roughly  on  the  order  of 
the  number  of  edges  that  make  up  the  object. 

Before  the  process  of  pattern  matching  can  commence,  the 
portions  must  be  normalized.  The  notion  of  a  transforma- 
tion base  set  is  central  in  the  scope  of  a  portion.  Such  a  base 
set  uniquely  determines  a  similarity-invariant  transforma- 
tion (translation,  rotation,  scale)  with  which  the  features  of 
a  portion  will  be  transformed  from  their  local  coordinate  sy- 
stem into  the  global  hash  space.  Every  normalized  feature 
contains  a  reference  to  the  portion  it  belongs  to,  which  in 
turn  contains  the  name  of  the  original  object  as  well  as  the 
transformation  parameters.  An  object  is  thus  multiply  en- 
coded by  means  of  its  complete  set  of  normalized  portions. 

The  recognition  phase  proceeds  as  follows.  In  a  preproces- 
sing stage,  an  edge  segmentation  algorithm  is  applied  to  the 
pixel  image  of  the  scene  as  recorded  by  the  sensor.  For  every 
edge,  the  corresponding  feature  together  with  its  attribute 
list  is  calculated.  Next,  a  portion  is  composed  as  a  collection 
of  neighboring  edges.  From  the  portion's  base  set  of  points, 
a  transformation  matrix  is  determined  with  which  all  the  fea- 
tures of  this  portion  are  normalized.  This  normalized  por- 
tion to  be  recognized  is  now  compared  against  all  previously 
normalized  portions  stored  in  hash  space  on  a  feature-by- 
feature  basis:  every  feature  of  the  portion  is  compared 
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against  every  other  feature  in  its  (hash-space)  vicinity.  Be- 
cause noise  has  to  be  taken  into  consideration,  the  features 
pre-recorded  in  hash  space  are  assumed  to  represent  the 
mean  value  of  a  normal  distribution  N(u.,  2)  (with  covarian- 
ce  matrix  2)  rather  than  an  exact  position.  The  hash  space 
should  be  organized  so  as  to  provide  quick  access  to  all  the 
features  in  the  vicinity  of  a  given  point.  This  is  achieved  by 
subdividing  the  hash  space  into  so-called  bins  (a  mesh  of 
small  boxes  with  pointers  to  a  list  of  all  features  that  fall  into 
the  respective  bin).  Every  feature  in  the  vicinity  of  the  por- 
tion's point  feature  gets  a  vote  that  increases  with  decreasing 
distance  and  approaches  1  as  the  distance  approaches  0.  The 
following  formula  is  introduced  as  a  rough  approximation 
for  the  vote  a  single  feature  may  contribute:  v(d)  =  exp  (-  X. 
d),  where  d  is  the  Euclidian  distance  and  X  >  0  an  adjustable 
parameter.  All  the  votes  that  are  computed  during  a  single 
hash  table  inquiry  are  accumulated  on  a  per-portion  basis 
(and  not  a  per-object  basis).  The  portions  with  the  two  high- 
est accumulated  votes  are  tracked  down  and  compared 
against  each  other.  A  positive  decision  about  a  correct  match 
can  be  made  if  (i)  the  absolute  value  of  the  vote  of  the  top 
portion  and  (ii)  the  distance  between  the  votes  of  the  first  and 
second  portions  exceed  certain  thresholds.  Otherwise,  a 
vector  with  the  first  k  "promising"  portions  (where  k  may 
vary  according  to  the  distribution  of  the  top  votes)  together 
with  the  votes  they  received  is  memorized,  and  the  next  hash 
table  inquiry  with  a  new  portion  of  object  features  extracted 
from  the  scene  is  launched. 

If  recognition  on  the  portion  level  is  not  possible  (because 
of  continued  ambiguities),  recognition  on  the  object  level 
might  still  be  possible.  In  order  to  achieve  this,  the  votes  that 
were  memorized  at  the  end  of  each  hash  table  inquiry  could 
be  accumulated  across  inquiries  on  a  per-object  basis.  In  the 
end,  one  object  might  have  received  an  overwhelming  accu- 
mulated vote  that  distinguishes  it  from  the  others. 

4.  ROBOT  NAVIGATION 

A  promising  application  field  of  our  A-MBR  approach  is  ro- 
botics and  in  particular  navigation,  relocation,  position  es- 
timation, recognition/learning  and  planning  as  an  integrated 
solution.  Two  examples  illustrating  most  aspects  are  briefly 
described  in  Section  6. 

Figure  3  shows  the  schematic  of  robot  positioning  by  our 
A-MBR  procedure  using  laser  range  data  by  means  of 
known  positions  of  objects  (walls).  In  our  first  simulated 
scenario  there  are  16  rooms  of  different  complex  structure, 
few  rooms  are  similar.  New  rooms  can  be  generated  and  re- 
place others.  We  can  place  the  robot  at  any  position  in  the 
16  rooms  with  any  direction.  We  start  the  relocation  process 
by  scanning  the  environment  at  the  start  position.  Cyclic 
position  estimation  and  position  correction  is  performed  by 


the  A-MBR  recognition  (e.g.  disturbances  by  mobile  objects 
are  tolerated  up  to  a  certain  extent).  Each  hashed  object  por- 
tion knows  its  current  position.  The  output  is  the  name  of  the 
room  and  the  current  robot  position  (computed  using  recog- 
nition and  current  laser  range  data).  In  case  of  grasping  or 
docking,  we  get  the  geometric  data  for  the  controller  (grasp- 
ing points,  docking  points)  depending  on  the  recognized  ob- 
ject and  its  position. 


Current  robot  position  in  an 
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Figure  3  Object  recognition  for  robot  relocation 


5.  RECOGNITION  AND  LEARNING 

A-MBR  is  able  to  recognize  known  objects,  rooms  and  envi- 
ronments. Robustness  allows  the  recognition  of  noisy  or 
partially  covered  objects.  Figure  4  shows  a  2D  example  of 
a  real  laser  scan  of  our  laser  sensor  ( 1 80  degrees).  Raw  input 
data  are  preprocessed  (first-step  noise  cancellation).  Feature 
extraction  detects  lines  and  other  features  (depending  on  the 
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context).  In  addition,  features  such  as  segmentation  points 
and  lines  and  points  of  further  interest  are  derived.  Fig- 
ure 5  shows  the  simulated  set  of  extracted  feature  data  of 
one  complete  scan  (360  degrees)  at  time  t. 

Recognition  starts  as  described  above  with  selection  of  in- 
formation portions  (feature  neighborhood)  from  the  feature 
data,  depending  on  the  focus  of  interest  (e.g.  object  nearest 
to  the  sensor).  The  input  data  portion  is  compared  to  the 
hashed  object  data  portions.  Each  portion  belongs  to  an  ob- 
ject concept  and  a  well-defined  name,  or  it  belongs  to  a  pre- 
liminary concept  of  new  learned  data. 
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Figure  4  Example  of  a  real  laser  scan 


As  described  above,  the  result  is  the  number  of  matches.  The 
further  recognition  process  is  controlled  depending  on  the 
number  of  objects  matching.  Resulting  answers  are  for  ex- 
ample: too  many,  some,  few,  one,  or  no  matches.  If  there  is 
exactly  one  match  wilh  high  probability  against  the  rest  of 
the  data  base,  it  is  assumed  to  be  correct.  If  there  is  more  than 
one  match,  more  sensor  data  are  needed.  No  match  means 
either  the  wrong  model  data  set  is  used,  or  it  is  an  unknown 
object  and  a  candidate  for  learning.  If  results  are  not  clear 
enough  for  precise  decision  making  the  recognition  process 
continues  self-controlled  sensor  data  acquisition  step  by 
step.  In  most  cases  only  one  scan  is  required  to  achieve  rec- 
ognition under  normal  conditions  (one-shoot  recognition). 

We  introduced  new  learning  procedures  for  A-MBR.  Self- 
controlled  data  acquisition  uses  some  preferred  options  for 
self-planned  active  sensing,  e.g.  turning/moving  sensor 
head,  moving/changing  position,  motion  to  the  remaining 
points  of  interest,  etc.  Figure  6  shows  the  summarized  data 
set  of  extracted  features  computed  (merged)  from  some 
scanning  cycles  (and  recognition/learning).  One  can  see  the 
more  complete  view  compared  to  Figure  5  .  We  are  accu- 
mulating sensor  data  on  the  feature  level  (merging  numeri- 
cal and  symbolic  features  accompanied  by  failure  correc- 
tion). 
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Figure  6  Summarized  feature  data  of  a  still  unknown  ob- 
ject after  a  few  scan  cycles  (memory  coordinates) 


Figure  5  Extracted  features  of  one  laser  scan  (360  degree, 
robot  coordinates) 
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Besides  detected  lines  there  is  additional  information.  In 
particular,  the  remaining  points  of  further  interest  which  are 
used  for  continuing  recognition  if  needed. 

If  the  current  task  is  positioning  or  recognition,  the  process 
is  stopped  after  successful  recognition.  The  result  is  the 
name  of  the  object/room  and  the  current  robot  position.  If  we 
need  higher  reliability,  or  the  object/room  seems  to  be  un- 
known, the  process  continues  until  all  available  sensor  data 
are  considered.  After  this  exhaustive  process  (e.g.  an  object/ 
room  has  been  completely  scanned)  the  system  has  to  decide 
whether  the  object/room  has  been  known  before  to  the  sys- 
tem or  not.  Thus  the  system  can  store  the  summarized  in- 
formation as  a  candidate  for  learning,  usable  as  preliminary 
vague  data.  Real  learning  should  be  accepted  after  verifica- 
tion, e.g.,  or  entering  a  known  area  again,  or  after  confirma- 
tion of  relevance  by  higher-level  instances.  One  can  see 
there  is  no  difference  between  the  robust  recognition  proce- 
dure using  all  available  methods  for  gaining  information  and 
the  process  of  exploration  of  learning  new  sensor  data  and 
adding  to  the  world  model. 

The  most  interesting  and  most  important  task  is  to  find  the 
best  voting  principles.  This  is  an  open  research  field.  Differ- 
ent voting  principles  on  different  voting  levels  are  applied, 
supported  by  voting  on  the  symbolic  level  and  on  the  level 
of  relations  and  task  context. 

For  systematic  investigations  of  new  voting  principles  and 
their  recognition  probability  and  for  evaluation  of  robust- 
ness, we  are  able  to  tune  the  disturbance  of  the  input  data. 
With  increasing  disturbance  the  recognition/learning  pro- 
cess becomes  more  uncertain.  Thus,  it  takes  more  time  for 
reliable  decision  making.  The  result  confirms  the  robustness 
of  the  approach. 

Concept  generation  (self-defined  terms)  for  new  learned 
data  can  be  performed  by  taking  the  concept  of  the  most  sim- 
ilar data  of  the  existing  knowledge  and  adding  some  com- 
ments expressing  similarity,  uncertainty  and  the  recognized 
main  feature  differences  compared  to  the  feature  vector  of 
the  selected  concept.  Resulting  information  are  e.g.  "un- 
known object,  but  similar  to  object  B  differing  in  feature  c". 
If  there  is  no  similar  knowledge  the  simplest  procedure  is 
to  append  data  to  a  running  numbered  list  of  unknown  ob- 
jects. Renaming  of  preliminarily  named  objects  has  to  be 
done  at  a  certain  instant  of  time  by  considering  more  knowl- 
edge (via  world  knowledge)  or  by  knowledge  acquisition 
via  MMI  (communication  to  humans)  using  the  quasi-sym- 
bolic description  of  features.  This  is  a  goal  for  future  re- 
search. 

Note  that  learning  is  done  via  sparsely  coded  knowledge 
data  by  A-MBR.  Hash-table  entries  are  generated  from  ac- 
cepted A-MBR  knowledge  in  order  to  keep  all  data  consis- 
tent. 


6.  SELF-CONTROLLED  LEARNING 

Two  examples  describe  the  problems.  In  the  first  example, 
an  autonomous  system  doesn't  know  where  it  is  and  it  has 
to  explore  the  environment  in  order  to  find  any  known  ob- 
jects or  relocation  points  (the  curious  robot).  If  we  start  with- 
out knowledge  and  without  GPS  information  it  defines  the 
current  position,  or  a  first  good  position  (middle),  of  the  un- 
known room  as  the  origin  of  exploration  (learning  frame, 
preliminary  coordinates).  If  a  first  known  object/position  is 
detected  it  is  able  to  transform  all  learned  objects/positions 
to  the  coordinate  system  of  the  known  objects  (knowledge 
frame,  global  coordinates).  The  learning  process  has  to  con- 
sider the  transformation  via  coordinates  of  the  robot  and  its 
sensors  (robot  frame,  platform  coordinates  and  sensor 
frame,  coordinates  of  active  sensor). 


Figure  7  Derived  planning  map  for  exploration  with  the 
computed  path  from  the  current  position  (right) 
to  the  next  point  of  interest  (left) 

A  second  example  of  a  goal-oriented  navigation  (fire  fight- 
ing robot)  is  not  concerned  with  exhaustive  exploration  but 
with  fastest  motion  toward  one  or  more  goal  positions  or 
centers  of  interest  (valve,  stopcock,  center  of  fire,  injured 
persons)  It  has  an  orientation  toward  the  goal  and  learns  on- 
the-fly.  The  first  priority  is  the  fastest  move  to  the  goal,  but 
it  leams  differences  of  the  map,  closed  gateways  using  old 
maps,  or  assuming  free  space.  Its  start  position  is  known, 
direction  or  goal  position  is  known,  but  the  area  between  is 
unknown  (destroyed). 
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Both  procedures  are  self-planning.  Figure  7  shows  an  ex- 
ample of  dynamic  map  generation  and  path  planning  for  op- 
timized recognition/learning/exploration  process.  Input  to 
the  planner  is  the  latest  update  of  the  summarized  informa- 
tion (features).  The  information  is  used  to  generate  a  grid- 
based  planning  model  of  desired  size,  granularity  and  de- 
sired value  of  obstacle  growing.  The  planning  process  is 
performed  after  every  new  scan  to  plan  an  efficient  way  to 
the  next  point  of  interest  (point  of  exploration)  or  even  with 
a  higher  cycle  rate.  We  integrated  our  planning  approach  [5] 
using  a  diffusion  algorithm.  The  resulting  path  and  laser 
range  data  are  the  input  to  the  platform  controller  (reflexive 
knowledge)  for  collision  avoidance  and  for  following  the 
path  to  the  next  point  of  interest.  If  the  planner  doesn't  find 
a  solution,  map  generation  switches  to  the  higher  resolution 
level.  Planning  works  with  incomplete  knowledge.  All  ac- 
tions can  be  supervised,  interpreted  and  reported  using  sym- 
bolic knowledge. 

7.  EXPERIMENTAL  RESULTS 

Current  experiments  are  done  on  a  Silicon  Graphics.  One 
complete  cycle  of  recognition  as  described  above  takes  less 
than  one  second  for  a  normal  scenario.  Target  hardware  is 
parallel  digital  signal  processor  systems  and  multi-PC  sys- 
tems. Previous  investigations  on  scalability  and  speed  have 
been  done  on  the  Connection  Machine  CM-5  with  32  nodes 
and  the  data-parallel  language  C*.  Figure  8  shows  the  de- 
pendency of  run  time  for  object  recognition  (logarithmic) 
on  the  number  of  objects  in  the  database  (symbolic  name  of 
the  object,  here  #  of  the  object  in  the  database) .  The  objects 
are  polygons  with  a  distinguished  edge  for  control.  The 
hashed  object  portions  consist  of  5  edges  in  this  experiment. 
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Figure  8  A-MBR  -  run  time  measurement  on  a  paral- 
lel computer  system  -  the  Connection  Machi- 
ne CM-5  with  32  nodes 


8.  CONCLUSIONS 

The  A-MBR  approach  is  able  to  support  recognition  and 
learning  in  a  very  fast  and  accurate  way.  It  is  an  open  ap- 
proach for  handling  large  data  sets  of  sparsely  coded  and  in- 
complete knowledge  (numerical  and  symbolic).  Our  meth- 
od provides  a  high  functionality,  in  particular  for  robust 
recognition  in  a  dynamically  changing  world  of  objects,  for 
exploration  of  unknown  space  regions  accompanied  by  ro- 
bust path  planning,  and  for  interpreting  actions  and  events 
on  a  symbolic  level.  The  A-MBR  method  is  scalable.  Thus, 
the  algorithm  is  simply  mappable  onto  parallel  architec- 
tures. Tests  and  simulations  have  been  done  on  different 
hardware,  e.g.  Silicon  Graphics,  Multi-DSP's,  CM-5.  The 
target  robot  platform  is  equipped  with  a  laser  scanner  (com- 
pany Sick).  Later,  a  laser  image  camera  (a  laser  distance 
camera  manufactured  in  our  company)  will  be  applied.  We 
are  working  on  the  extension  of  our  approach  to  recognition 
of  3D  objects.  On  a  Silicon  Graphics,  recognition  of  a  simu- 
lated 3D  object  takes  less  than  one  second  in  a  scenario  of 
4  selected  objects  and  a  database  of  1 024  objects.  We  intend 
to  increase  the  A-MBR  database  with  real  3D  object  data  in 
combination  with  symbolic  knowledge.  We  are  investigat- 
ing further  extensions  and  new  learning  methods  and  plan  to 
integrate  them  in  our  system  architecture. 
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Abstract 

B.F.  Skinner  developed  a  detailed  analysis  of  how  the 
remarkable  phenomena  of  language  and  symbol  systems  could 
emerge  from  the  relatively  simple  processes  of  operant 
conditioning.  The  creation  of  computer  models  of  operant 
conditioning—  adaptive  critic  agents— finally  enables  this 
controversial  theory  to  be  tested.  This  paper  describes  simulations 
training  such  agents  following  procedures  derived  from  Skinner's 
analysis,  similar  to  those  used  to  train  humans.  The  simulations 
demonstrate  several  of  the  basic  elements  of  symbol  systems,  but 
more  importantly  that  they  combine  to  produce  significant 
emergent  performances  such  as  following  memorized  rules.  Two 
simulations  contradict  one  of  the  supposed  limitations  of 
conditioning  theories  of  language,  that  they  do  not  produce 
generalization  to  untrained  situations.  These  results  encourage 
further  research  in  this  parsimonious  theoretical  direction. 

Keywords:  emergence  of  language,  computer 
simulations,  operant  conditioning,  syntax,  rules 

1.  Introduction 

One  of  the  main  controversies  about  symbol 
systems  and  language  is  whether  they  can  emerge  from 
learning,  and  if  so  whether  specialized  brain  mechanisms 
are  required.  B.F.  Skinner's  book,  Verbal  Behavior  [19], 
presented  an  extremely  detailed  analysis  of  how  an  adaptive 
autonomous  system  can  learn  a  complex  symbol  system 
based  on  selection  of  behavior  by  its  consequences,  i.e., 
operant  conditioning.  There  are  currently  800  professionals 
in  a  special  interest  group  applying  and  expanding  Skinner's 
analysis  in  research  and  clinical  work.  However,  despite  the 
considerable  productivity  of  this  theory,  it  has  had  little 
impact  outside  the  field  of  behavioral  analysis.  The  author 
attributes  this  lack  of  impact  to  the  fact  that  verbal  behavior 
research  of  this  kind  with  human  and  animal  subjects  is 
inevitably  open  to  question  for  two  reasons:  (1)  The 
daunting  complexity  of  variables  cannot  be  reduced  by 
imposing  experimental  controls  with  young  human  subjects 
as  is  done  with  animals,  and  (2)  We  do  not  know  important 
properties  of  the  system  we  are  studying:  for  example,  when 


a  subject  quickly  learns  a  verbal  function  in  our  procedures, 
is  it  attributable  to  operant  conditioning  alone  or  is  there 
also  a  specially-adapted  brain  mechanism? 

Therefore,  the  recent  development  of  computer 
systems  that  can  simulate  the  learning  processes  of  operant 
conditioning  is  especially  valuable.  Such  computer  "agents" 
enable  us  to  overcome  both  of  these  problems:  we  can  know 
the  exact  characteristics  of  an  agent,  and  we  can  control  the 
agent's  environment  through  its  entire  history.  At  the  least, 
such  simulations  can  provide  sufficiency  tests  of  hypotheses 
such  as  Skinner's.  Moreover,  if  the  training  process  is 
similar  to  what  humans  experience,  and  if  the  acquisition 
process  follows  patterns  like  those  of  humans,  the 
simulations  provide  a  stronger  argument  that  human  verbal 
behavior  is  learned  the  same  way.  We  advocated  such 
research  in  the  early  days  of  neural  networks  [7,8],  and  have 
published  earlier  descriptions  and  research  in 
[7,11,12,13,21,22], 

This  paper  gives  an  overview  of  seven  years  of 
research  on  the  training  of  verbal  behavior  in  adaptive 
autonomous  computer  models  that  simulate  operant 
conditioning,  along  with  more  detail  on  three  pertinent 
studies.  We  have  used  neural  network  models  of  the 
adaptive  critic  type  [1,15].  The  architecture  is  shown  in 
Figure  1,  and  detailed  descriptions  and  source  code  for  our 
system  is  provided  in  [13].  The  training  has  closely 
followed  guidelines  suggested  by  Skinner's  detailed  analysis, 
with  exact  training  procedures  specified  in  computer-based 
training  (CBT)  format.  The  agent  and  training  software  is 
written  in  Java  and  operates  in  environments  written  in 
VRML  2.0,  though  many  of  the  older  simulations  have  not 
yet  been  converted  from  the  original  Smalltalk  versions. 


2.  Overview  of  Skinner's  Theory 

Skinner's  analysis  requires  very  limited 
assumptions  about  the  properties  of  organisms:  that  they  be 
capable  of  operant  and  classical  conditioning.  Operant 
conditioning  could  be  roughly  defined  as  the  selection  of 
behavior  by  the  value  of  its  consequences,  where  we  assume 
the  primary  values  have  themselves  been  selected  by  a 
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Figure  1 .  Agent  architecture:  a  type  of  adaptive  critic 


genetic  evolutionary  process  (e.g.,  water  is  reinforcing 
because  organisms  which  are  so  reinforceable  are  more 
likely  to  survive;  see  [4,20].  Reinforcing  events  strengthen 
the  connections  between  recently-emitted  responses  and  the 
stimuli  which  were  present.  Secondary  conditioning  is  very 
important  in  learning,  and  is  nicely  modeled  in  adaptive 
critic  systems.  None  of  these  assumptions  is  very 
controversial  either  to  adaptive  agent  modelers  or  to  animal 
researchers.  Note  that  the  agent  architecture  in  Figure  1 
contains  only  elements  necessary  for  operant  conditioning, 
nothing  specialized  for  symbolic  processes. 

Skinner  proposed  several  basic  units  of  verbal 
relations,  and  argued  that  the  complexity  of  human  verbal 
behavior  and  symbol  systems  is  produced  by  variations  and 
combinations  of  these  units  in  complex  environments.  The 


basic  units  are  relatively  simple  and  easily 
demonstrated  (see  below)  Therefore,  the  two  main 
open  questions  seem  to  be: 

1.  Do  the  basic  units  of  verbal  behavior 
actually  combine  to  produce  complex  functional 
behavior? 

2.  Does  learning  generalize  to  untrained 
situations,  as  it  must  to  explain  human  verbal 
behavior? 


The  basic  units  are  the  tact,  intraverbal, 
mand,  and  autoclitic,  along  with  a  closely  related 
unit  which  has  been  called  pliance.  A  tact  (cf. 
contact)  is  defined  as  a  verbal  response,  similar  to 
naming,  controlled  by  environmental  stimuli  such 
as  properties  of  objects  or  events  (see  Fig.  2a). 
Neural  network  researchers  have  easily 
demonstrated  similar  "categorizing"  responses 
controlled  by  stimulus  patterns,  though  the  category 
is  generally  represented  simply  by  which  output  is 
activated  rather  than  by  more  complex  naming 
responses.  Tacts  may  be  nouns,  adjectives,  verbs, 
adverbs,  other  parts  of  speech  or  smaller  units  such 
as  the  suffixes.  An  intraverbal  (Fig.  2b)  is  a  verbal 
response  to  a  verbal  stimulus,  for  example,  saying 
"4"  in  response  to  hearing  "2  plus  2  is"  or  saying 
"Denver"  in  response  to  hearing  "The  capital  of 
Colorado  is".  Intraverbals  are  often  considered 
trivial,  yet  all  explicit  knowledge  can  be  stated  as 
intraverbals  (cf.  cognitive  structures  or  world 
"models").  A  mand  (cf.  command)  is  a  verbal 
response  with  a  characteristic  consequence,  which 
is  therefore  typically  evoked  by  corresponding  states 
of  deprivation;  e.g.,  asking  for  water  when  thirsty. 
An  autoclitic  is  a  verbal  response  controlled  by  a 
combination  of  verbal  and  nonverbal  stimuli,  and  is 
therefore  responsible  for  most  grammatical 
structuring.  Pliance  (cf.  compliance)  is  a  nonverbal 
response  to  a  verbal  stimulus;  e.g.,  lifting  an  arm  upon 
hearing  "Lift  your  arm"  (Fig.  2c). 

Each  of  these  units  has  well-defined  stimuli  and 
responses,  and  they  can  be  easily  demonstrated  in  adaptive 
critic  agents,  as  in  the  first  simulation  described.  Many 
important  verbal  phenomena  are  variations  of  these  basic 
units;  for  example,  abstraction  or  analogy  occurs  when 
subsets  of  properties  of  objects  control  tacts,  such  as 
"animal"  or  "furmture."  Other  phenomena  occur  when 
variables  combine  in  the  complexity  typical  of  the  natural 
environment,  such  as  tacting  "heart  attack"  vs.  "myocardial 
enfarction"  depending  on  both  the  condition  and  the 
"audience"  stimuli.  However,  we  have  suggested  above  that 
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Figure  2.  How  Tact,  Intraverbal,  and  Pliance  Combine  to  Produce  Rule-Following 


the  more  challenging  issues  are  whether  these  units  combine 
to  produce  complex  language-based  functioning  and 
whether  they  generalize  to  untrained  situations.  We  will 
briefly  describe  simulations  that  give  initial  affirmative 
answers  to  both  of  these  questions. 


3.  Tact  +  Intraverbal  +  Pliance 
=>  Rule-Following 

An  extremely  important  function  of  language  is 
enabling  culture  itself:  the  ability  of  humans  to  describe 
useful  relations  or  rules  to  others,  who  can  then  behave 
effectively  without  having  to  learn  personally  through 
extensive  and  often  dangerous  direct  experience.  Werbos 
[24]  has  shown  the  power,  and  argued  for  the  necessity,  of 
"world  models"  in  intelligent  agents,  but  like  most  theorists 
he  has  created  a  hybrid  system  with  a  separate  module  for 
this  function.  Can  we  simulate  this  functional  language 
using  the  verbal  units  described  in  a  much  simpler,  operant 
system?  Our  formulation  is  as  follows:  verbal  relations  such 
as  rules  can  be  stated  as  intraverbals;  an  example  we  will 
use  is  "If  red  then  move  forward."  For  the  intraverbal  to  be 
functional,  its  elements  must  have  "meaning"  to  the  agent; 
i.e.  both  the  conditions  (IF)  and  the  actions  (THEN)  must 
connect  appropriately  to  the  real  world.  Logically,  we  look 
for  two  corresponding  kinds  of  units:  on  the  sensory  side,  we 
need  a  unit  with  environmental  stimuli  as  inputs  and  with 
verbal  responses  as  outputs;  on  the  response  side,  we  need  a 
unit  with  verbal  inputs  and  with  actions  on  the  environment 
as  outputs.  The  tact  meets  the  former  requirements,  and 
pliance  meets  the  latter.  Figure  2  shows  how  these  units 
should  fit  together  to  produce  rule-following  behavior, 
according  to  this  relatively  simple  formulation. 

Upon  sensing  environmental  stimuli,  the  agent 
utters  a  corresponding  tact,  which  produces  stimuli  that 


evoke  the  intraverbal  response;  the  intraverbal  produces 
stimuli  that  instruct  the  appropriate  action  on  the 
environment.  Observations  of  humans  [6,18]  suggest  that 
humans  often  behave  in  exactly  this  way  when  following 
memorized  rules. 

In  this  simulation,  the  two  rules  were  "If  red  then 
move  forward"  and  "If  green  then  move  back."  Our 
objective  is  for  the  agent  to  memorize  the  rules  and  learn  the 
meanings  of  their  elements  and  relations  such  that  it  can 
follow  the  rules  the  first  time  after  learning  them,  without 
having  ever  done  those  behaviors  in  those  conditions 
previously.  First,  we  trained  a  naive  agent  to  imitate  the 
sounds  A,  B,  F,  G,  I,  M,  R,  and  U  by  presenting  the  sounds 
and  giving  simulated  calories  if  the  agent  produced 
corresponding  sounds.  Using  those  sounds  as  prompts 
which  were  then  faded  out,  we  trained  the  agent  to  tact  the 
color  of  red,  green,  and  blue  objects,  where  saying  "red" 
required  the  agent  to  say  "R"  then  "A",  green  was  "G"  "I", 
and  blue  "B"  "LP'.  For  example,  a  red  object  was  presented 
along  with  the  sound  sequence  "R  A",  after  which  the 
agent's  correct  imitation  of  the  sounds  was  reinforced.  The 
"R  A"  prompt  was  faded  until  the  agent  could  name  the 
color  without  any  prompt.  Similarly,  prompting  was  used  to 
train  the  agent  to  say  "R  A  M  F"  (=  "red,  move  forward") 
and  "G  I  M  B"  (=  "green,  move  back"),  with  no  colors 
present  (Note  that  humans  typically  say  rules  in  a  minimal 
form  without  "If'  or  "then"  [18]).  Finally,  we  trained  the 
agent  to  move  forward  or  back  after  hearing  and  repeating 
the  instructions  "move  forward"  or  "move  back"  ("M  F",  "M 
B").  The  detailed  training  procedure  is  described  in  [1 1, 12], 
and  the  exact  code  for  agent  and  training  are  in  [13],  The 
training  procedures  are  like  those  a  behavioral  trainer  might 
use  with  humans.  At  the  end  of  this  training,  the  agent 
moves  forward  when  it  sees  red  and  moves  back  when  it  sees 
green,  the  very  first  time  it  sees  these  colors  after  learning 
the  rules.  This  rule-following  is  mediated  by  the  sequence 
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of  verbal  responses  as  we  predicted,  exactly  as  observed  with 
humans  following  rules. 


4.  Does  Learned  Syntax  Generalize? 

Two  simulations  have  shown  that,  contrary  to 
simplistic  predictions  ([3],  but  see  [14,17]),  this  process  of 
learning  can  produce  grammatical  ordering  that  generalizes 
to  novel  instances,  as  it  must  to  explain  human  performance. 

The  first  simulation  shows  that  after  learning  only 
a  few  examples  of  saying  adjectives  (color)  before  nouns 
(based  on  shape),  the  agent  will  do  the  same  in  a  novel  case. 
In  this  simulation,  the  agent  was  first  taught  to  tact  blue, 
green,  and  red  colors  with  unnamed  shapes,  using  the  same 
procedure  as  in  the  prior  simulation.  Similarly,  the  agent 
was  taught  to  tact  circle,  square,  and  triangle  shapes  with 
unnamed  color.  Of  the  nine  possible  combinations  of  color 
and  shape,  eight  were  then  trained  by  presenting  the 
combined  stimulus  (e.g.,  a  red  triangle)  and  reinforcing  only 
the  correct  sequence  of  verbal  responses  ("red  triangle",  or 
"R  T"  in  the  simplified  language).  It  is  very  notable  that 
even  before  the  end  of  this  training  phase,  the  agent  named 
two  of  these  stimuli  correctly  the  very  first  time  they  were 
presented.  This  cannot  be  explained  by  chance,  since  the 
agent's  natural  tendency  is  to  repeat  reinforced  responses,  so 
that  it  would  say  "R  R"  if  it  happened  to  say  "R"  first  and 
received  reinforcers,  as  occurred  at  this  stage  of  training. 
When  the  ninth  combination  was  presented,  the  agent  used 
the  correct  grammatical  ordering  the  first  time  it  ever 
encountered  a  red  circle,  even  in  a  test  when  hundreds  of 
irrelevant  responses  occurred  between  the  last  training  trial 
and  the  first  test  presentation. 

Simulations  enable  us  to  analyze  and  understand 
surprising  behaviors  such  as  this.  We  have  found  that  an 
extremely  useful  relation  the  agent  learns  is  that,  after 
saying  something,  saying  it  again  will  rarely  produce 
reinforcement  even  though  the  conditions  that  evoked  the 
response  the  first  time  are  still  present  (e.g.,  red  is  still 
present).  Not  only  is  it  generally  true  that  listeners  do  not 
want  to  hear  the  same  information  twice,  but  suppression 
also  functions  as  a  modulator  for  syntax  in  many  cases.  In 
the  red  circle  case,  saying  "red"  suppressed  saying  it  again 
so  that  saying  "circle"  would  become  relatively  stronger. 
The  architecture  in  Figure  1  enables  this  to  be  learned  by  the 
well-known  mechanism  of  recurrence:  the  agent's  responses 
produce  stimuli  for  subsequent  responses.  This  mechanism 
will  show  up  again  in  the  next  simulation. 

The  second  simulation  addressed  a  more 
challenging  situation.  Can  an  operant  agent  learn  to  follow 
implicit  grammatical  rules  to  compose  novel  relational 
statements  describing  its  environment?  For  example,  can  an 


agent  learn  to  make  descriptive  statements  of  the  type  "X  is 
left  of  Y"  when  presented  with  pairs  of  objects  in  left-right 
relations,  including  the  very  first  time  either  of  the  objects 
has  been  in  that  relation? 

The  test  was  for  the  agent  to  learn  to  say,  for 
example,  "Carp  is  left  of  tuna"  when  those  objects  were 
presented  visually  in  that  relation,  even  when  neither  object 
had  ever  been  seen  in  that  relation  before.  The  agent  was 
taught  to  name  four  visual  patterns,  called  shark,  tuna,  carp, 
and  jellyfish  (named  "S",  "T",  "C"  and  "J"  in  our  simplified 
language).  Then  these  objects  were  presented  within  its 
field  of  vision  but  not  directly  in  front  of  its  eyes,  so  that  it 
learned  to  move  its  eyes  to  look  directly  at  each  object  and 
then  to  name  it  (this  procedure  is  consistent  with  the  concept 
of  "active  vision"  [2]).  In  the  main  training  procedure,  the 
agent  received  only  18  training  trials  of  2  cases  each,  one  in 
which  a  shark  was  left  of  a  tuna  and  one  in  which  a  jellyfish 
was  left  of  a  carp.  The  agent  learned  to  look  at  the  nearest 
object,  name  it,  look  left  or  right  as  necessary  to  look  at  the 
second  object,  say  either  "is  leftOf  *  or  "is  rightOf  *  ("I  L"  or 
"I  R")  as  demanded  by  the  situation,  then  to  name  the 
second  object.  Correct  responses  were  reinforced  with 
simulated  calories.  After  this  brief  training,  a  test  situation 
was  presented:  a  carp  to  the  left  of  a  shark.  Even  though 
those  objects  had  not  been  seen  in  those  positions  before,  the 
agent  correctly  said,  "Carp  is  left  of  shark"  ("C  I L  S").  How 
did  it  learn  this  generalized  syntax  after  only  2  examples  of 
it?  As  in  the  previous  simulation,  an  examination  of  its 
network  at  each  point  in  time  showed  how  locally-available 
cues  were  adequate.  Upon  seeing  an  object,  its  strongest 
initial  response  is  to  orient  toward  it;  then  seeing  it  directly 
ahead,  the  strongest  response  is  to  name  it.  Naming  it 
produces  auditory  and  proprioceptive  stimuli  that  suppress 
naming  it  again,  so  orienting  toward  the  second  object 
becomes  the  strongest  response.  Now  there  are  two  strong 
responses:  one  is  to  name  the  second  object  because  the 
agent  is  now  looking  directly  at  it,  and  the  other  response  is 
to  say  the  relation  "is  leftOf'  because  of  the  stimuli  from 
having  just  done  the  corresponding  orienting  response.  Its 
history  of  reinforcement  has  produced  a  network  that  gives 
the  relational  response  more  strength,  given  this  stimulus 
pattern.  But  after  stating  the  relation,  the  response- 
produced  stimuli  suppress  saying  it  again  and  the  naming 
becomes  strongest.  This  is  a  very  nice  emergent  solution 
because  the  agent  can  learn  many  different  relational 
assertions  without  much  mutual  interference.  Such  control 
of  relational  statements  by  stimuli  from  the  agent's  own 
prior  movements  is  quite  general,  according  to  Skinner's 
[19,  pp  340-343]  expansion  upon  Tooke's  [23]  argument 
that  relational  words  generally  have  their  roots  in  actions. 
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5.  Other  Verbal  Phenomena  That  Have  Been 
Simulated 

The  following  is  a  listing  of  verbal  phenomena  the  author 
has  simulated,  including  ones  described  above: 

Generalized  echoic  behavior:  imitation  of  sounds 
Generalized  textual  behavior:  reading  from  text 
Tacting:  naming  objects,  actions,  properties  of  objects  and 
actions 

Intraverbals:  verbal  associations,  phrases,  sentences,  rules 
Instruction-following  ("Move  back"->  move  back) 
Generalized  word  ordering  (adjectives  before  nouns:  "blue 
circle") 

Generalized  relational  descriptions  of  scenes  ("Tuna  is  left 
of  shark") 

Generalized  identification  of  matching  vs.  non-matching 

pairs  of  objects  (Are  2  objects  the  same?) 
Same,  but  with  delay  between  stimulus  presentations 
Sequences  of  verbalizations  maintained  by  reinforcement 

only  at  the  end  of  the  sequence 
Symmetry  (learn  "A  =  B",  say  "B  =  A" ) 
Transitivity  (learn  "A  implies  B"  and  "B  implies  C",  say  "A 

implies  C" ) 

Generalized  compliance  with  memorized  rules  when 

condition  is  first  encountered 
Interplay  of  rule-following  and  direct  learning  from 

experience 

Creativity:  recombine  learned  responses  to  solve  problems 
[9] 

Rapid  training  of  verbal  responses  using  prompt  and  fade 
procedures,  as  used  with  humans 

Some  of  the  verbal  phenomena  we  would  like  to 
simulate  in  the  near  future  include:  using  real  voice  input 
and  articulatory  speech  output;  more  complex  grammar 
(e.g.,  "The  cat  that  ate  the  mouse  jumped");  using  larger, 
English-like  words;  training  large  verbal  repertoires; 
generalized  use  of  "not";  rhymes  and  puns;  self-editing; 
combining  relational  statements  ("Frog  is  on  log"  +  "Log  is 
beside  tree");  and  tacting  of  private  events  such  as  pain. 


6.  Discussion 

Our  simulations  serve  as  sufficiency  proofs  that 
adaptive  autonomous  agents  of  the  adaptive  critic  class  can 
learn  important  and  fairly  complex  functions  of  symbol 
systems  with  proper  training.  A  critical  property  of  this 
analysis  is  that  it  brings  language  into  a  larger 
biological/economic  framework:  Each  set  of  behaviors 


produces  consequences  sufficient  to  maintain  it,  or 
equivalently,  it  contributes  to  the  fitness  of  the  organism. 
Rule-following  generally  produces  valuable  results,  since 
people  learn  to  discriminate  reliable  from  unreliable  rule- 
givers,  as  an  inherent  part  of  operant  learning.  In  contrast, 
mechanistic  theories  of  rule-following  which  produce 
automatic  compliance  fail  to  explain  how  people  learn  to 
avoid  bad  advice.  Even  more  stringently  than  overall  value, 
Skinner's  theory  requires  that  every  action  of  each  agent  at 
each  point  in  time  be  learnable  and  sustainable  by  actual 
consequences  [19,  pp.  84-90],  a  point  emphasized  in  a 
simulation  of  a  two-agent  ecology  in  [10]). 

Even  if  the  training  used  in  these  simulations  is 
characteristic  of  disciplined  behavioral  training  techniques, 
some  critics  may  object  that  it  is  uncharacteristic  of  human 
experience.  While  typical  caregivers  are  not  as  consistent  as 
professionals,  observations  show  that  humans  provide 
extensive  language  training  in  the  natural  course  of 
interacting  with  children  [16].  For  example,  a  parent  might 
show  a  child  blue  objects,  say  "Blue"  to  each,  and  show  a 
positive  reaction  to  the  child's  saying  anything 
approximating  "Blue."  In  subsequent  cases,  prompts  will 
typically  be  made  weaker  and  better  pronunciation  will  be 
required. 

The  results  of  these  simulations  support  Skinner's 
analysis,  while  its  consistency  with  a  broad  biobehavioral 
perspective  gives  the  theory  further  appeal.  The  scientific 
principle  of  parsimony  compels  us  to  pursue  the  hypothesis 
that  symbolic  language  functions  emerge  from  accepted 
processes  of  operant  conditioning  without  any  need  for 
language-specific  brain  mechanisms. 
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ABSTRACT:  A  mathematical  theory  of  a  symbol  as  an 
adaptive  process  is  presented.  It  is  founded  on  the  theory  of 
modeling  fields,  in  which  learning  is  based  on  adaptive 
internal  models.  Quantum  computation  algorithms  of  the 
modeling  field  theory  are  developed.  I  discuss  computational, 
philosophical,  and  physical  aspects  of  the  dynamic  theory  of 
the  symbol. 
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1.  INTRODUCTION 

The  popularity  and  subsequent  fall-out-of-favor  of 
"Symbolic  AI"  left  many  researchers  with  a  bad  taste  towards 
the  word  "symbol".  But,  an  impression  that  "Symbolic  AI" 
represented  the  mathematics  of  symbols  is  very  wrong.  A 
symbol  is  not  a  monumental  piece  of  a  bronze  sitting  on  a 
foundation  of  stone.  A  symbol  is  a  fleeting  vortex  of  the 
interacting  perceptions,  feelings,  a  priori  models,  adaptation, 
attention,  behavior  and  concept  formation.  In  other  words,  a 
symbol  is  a  process.  It  is  a  process  of  thought. 

Semiotics  is  a  science  devoted  to  studying  signs  and 
symbols.  The  founders  of  semiotics  (Pierce,  1935;  Morris, 
1971)  introduced  the  notion  of  a  sign  as  a  trilateral  unity  of 
sign-vehicle,  (the  media  used  as  a  sign),  designation  (the 
object  which  the  sign  refers  to)  and  interpretant  (or  interpreter, 
a  mind  or  an  intelligent  system,  which  interprets  a  sign- 
vehicle).  The  process  of  interaction  within  this  triadic  unity  is 
called  semiosis.  Morris  decomposed  this  process  into  the 
three  dyadic  relationships-processes:  syntactics  (relations 
among  sign  vehicles),  semantics  (relations  between  sign- 
vehicles  and  their  designata),  and  pragmatics  (relations  between 
sign-vehicles  and  their  interpreters).  A  contemporary  tendency 
is  to  use  a  word  semiosis  for  the  learning  process  at  a  system 
level  involving  multiple  triadic  processes.  When  a  single 
triadic  process  is  concerned,  a  word  sign  is  used  in  relatively 
simple,  non-adaptive  cases  of  semiosis,  when  there  is  an  a 
priori  known  and  fixed  straightforward  way  of  relating  a  sign- 
vehicle  to  designatum  (such  as  a  look-up  table);  and,  a  word 
symbol  is  used  in  complicated  cases  of  a  single-triadic 
semiosis,  when  an  adaptive  process  is  needed  to  establish  the 
relationship;  this  adaptive  process  of  semiosis  is  a  symbol. 

According  to  Pribram  (1971),  signs  within  the  brain  are 
acts  of  communications  that  are  invariant  to  the  context,  while 
symbols  are  context  dependent.  Signs  are  the  results  of  the 
associative  cortex  affecting  the  input  sensory  systems.  These 
are  the  a  priori,  less-adaptive  aspects  of  the  internal  models. 


Symbols,  according  to  Pribram  are  the  results  of  interaction 
between  frontal  lobes  and  limbic  system,  they  are  stimulants 
to  actions  and  are  sensitive  to  context. 

In  this  paper,  mathematical  and  physical  theories  of  the 
symbol  are  developed.  They  are  based  on  the  modeling  fields 
theory  (MFT)  developed  previously  by  the  author  (Perlovsky, 
1996;  Perlovsky  et  al,  1997).  MFT  combines  a  priori 
knowledge  and  adaptivity  of  the  internal  model.  The  global 
model  describing  "the  world"  is  composed  of  multiple  "local" 
models  (similar  to  a  visual  field  being  composed  of  edges, 
etc.).  MFT  establishes  a  correspondence  between  subsets  of 
local  models  and  subsets  of  input  data,  while  adaptively 
learning  models.  This  process  is  similar  to  a  central  problem 
in  several  areas  of  artificial  intelligence  and  pattern  recognition 
[Winston,  1984;  Negahdaripour  &  Jain,  1991;  Segre,  1992]. 
Traditional  methods  of  solving  problems  of  this  kind  lead  to 
algorithms  of  exponential  complexity,  which  are  not 
physically  realizable  even  for  problems  of  moderate 
complexity,  and  which  can  not  therefore  serve  as  physical 
models  of  intellect.  MFT  leads  to  algorithms  of  linear 
complexity.  A  computational  system  of  MFT  is  composed  of 
multiple  interacting  loops  adapting  individual  models  and 
competing  for  evidence  in  the  input  data.  Each  loop  involves 
an  individual  internal  model,  a  measure  of  its  similarity  with 
the  data,  and  its  adaptation  mechanism  that  increases  the 
similarity.  Each  loop  is  a  semi-independent  intelligent  agent 
performing  an  adaptation-recognition  of  a  category.  It 
involves  a  sign-vehicle  that  is  the  category  name  (or  internal 
MFT  message-code),  a  designatum  that  is  a  subset  of  input 
data  corresponding  to  the  category,  and  interpretant  that  is  the 
MFT  system  that  interprets  and  acts  upon  the  recognized 
category.  Thus,  MFT  agents  recognizing  and  adapting 
categories  are  dynamic  symbol-processes.  The  dynamics  of 
agent-symbols,  the  process  of  semiosis,  is  composed  of 
syntactics,  semantics,  and  pragmatics. 

Where  are  the  dynamic  symbol  processes  physically 
realized  in  the  brain?  A  traditional  interpretation  is  that  they 
are  realized  by  networks  of  neurons,  such  as  described  in 
(Perlovsky  et  al,  1997).  An  alternative  interpretation  is  that 
each  neuron  has  a  microstructure  capable  of  performing 
complicated  computations.  These  sub-neuron  computations 
are  hypothesized  to  be  performed  by  quantum  process  taking 
place  in  microtubular  neuronal  substructure  [Hameroff  & 
Watt,  1983].  Keeping  in  mind  this  possibility,  I  describe  a 
quantum  system  implementing  MFT.  The  next  Section  2 
summarizes  the  mathematics  of  classical  MFT.  Section  3 
describes  quantum  computation  MFT  algorithms,  QMFT. 
Section  4  discusses  results  and  relationships  between 
mathematics,  Kantian  theory  of  mind,  and  classical  semiotics. 
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2.    MODELING  FIELD  THEORY 

MFT  considers  (1)  a  set  of  input  data  {X(n),  n  e  N}, 
where  each  member  is  a  vector  in  D-dimensional  space,  X  = 
{X<j,  d  =  1,...  D};  (2)  a  set  of  adaptive  categories  {h  €  H}, 
which  are  characterized  by  internal  parameters  {Sh}  and  by 
models  of  the  data  {Mn(Sh,  n)};  and  (3)  a  similarity  measure 
between  the  sets  of  models  and  data,  L({X},{M}).  A  set  of 

parameters  is  finite,  Sh  =  {San,  a  =  1,...  A},  but  not 
necessarily  limited,  and  a  set  of  categories  is  not  fixed,  its 
cardinality  H  is  finite,  but  may  vary  in  the  process  of  learning. 
A  similarity  measure  is  designed  so  that  it  treats  each  model  as 
an  alternative  for  each  piece  of  data 

L({X},{M})  =    J!      I    r(h)  *(X(n)  I  h),  (1) 
neN  heH 

here  /is  a  conditional  partial  similarity  between  data  vector 
X(n)  and  model  Mh-  For  example,  /can  be  selected  as  a 
probability  density  function.  Then  L  is  a  total  likelihood  (this 
interpretation  does  not  require  statistical  independence  among 
data  vectors  n  and  n':  dependencies  can  be  accounted  for  by 
considering  X(n')  as  parameters  of  the  models  for  the  data 
vector  n). 

The  problem  consists  in  concurrent  recognition  and 
adaptation  of  categories.  Adaptation  is  achieved  by  estimating 
internal  parameters  S,  and  recognition  consists  in  obtaining  a 
partition  of  data  among  categories, 

{N}  =  {Ni  IN2I...|NH},  (2) 

that  decides,  which  model  h  corresponds  to  an  observation  n 
(here  Nh  is  a  subset  of  N).  The  recognition  and  adaptation 
process  shall  maximize  the  similarity  (1).  When  likelihood  is 
used  as  a  similarity  measure,  this  is  a  problem  of  the 
maximum  likelihood  estimation. 

Categories  activated  by  high  similarity  values  produce 
actions,  including  generation  of  messages  transmitted  within 
the  intelligent  system,  which  acknowledge  the  category 
activation.  Messages  include  the  category  name  code  and 
model  parameters.  They  can  serve  as  input  data  for  other 
categories  and  can  be  used  for  control  of  actuators. 
Correspondingly,  data  are  sensory  data  or  internal  messages, 
and  models  represent  patterns  of  messages  and  sensory  data. 
The  above  mathematical  framework  is  fairly  broad  and 
encompasses  most  formulations  of  intelligent  adaptive 
systems.  It  includes  as  particular  cases,  statistical  and  model- 
based  pattern  recognition,  complex  adaptive  systems,  neural 
networks,  and  systems  of  intelligent  agents.  The  above 
formulation  addresses  "higher-level"  intelligence  functions  of 
recognition  and  adaptation.  A  compete  intelligent  system 
would  include  lower-level  drives  for  survival,  reproduction  and, 
in  case  of  an  artificial  system,  a  robot  or  infobot,  drives  for 
producing  specific  tasks  it  was  designed  for.  Each  individual 
model,  Mh,  together  with  the  process  of  its  adaptation  and 
recognition  defines  a  symbol-process. 

In  case,  when  a  set  of  observations,  N,  corresponds  to  a 


continuous  flow  of  signals,  for  example,  a  flow  of  visual 
stimuli  in  time  and  space,  it  is  convenient  instead  of  eq.(l)  to 
consider  its  continuous  version, 

L  =  exp  J    ln(  X   r(h) /(X(n)  I  h)  ),  (3) 
N  heH 

where  N  is  a  continuum,  such  as  time-space.  In  this  case, 
models  describe  a  continuous  modeling  field,  conditional 
partial  similarities  can  be  compared  to  Lagrangian,  and 
maximization  of  similarity  L  can  be  compared  to 
minimization  of  action  in  the  physical  field  theory. 

MFT  solution  of  this  problem  uses  fuzzy  adaptive  logic 
[Perlovsky,  1996].  Adaptive  fuzzy  class  memberships  f(hln), 
associating  each  data  vector  n  with  each  category  h  are  defined 

as: 

f(hln)  =  r(h)  /(X(n)lh)  /  I  r(h')  /(X(n)lh').  (4) 

h'eH 

In  case  of  the  maximum  likelihood  estimate,  upon 
convergence  of  the  estimation  procedure,  f(h  I  n)  can  be 
interpreted  as  a  posteriori  probabilities.  An  internal  dynamics 
of  the  Modeling  Fields  (MF)  is  given  by 
H 

df(h  I  n)/dt  =  f(h  I  n)  £  { [8hh'  -  f(h'  I  n)]  • 
h'=l 

[3ln/(X(n)lh')/3Mh']  3Mh'/3Sh'  •  dSh'/dt,  (5) 

dSh/dt  =  {I    f(h  I  n)  [3ln/(X(n)lh)/3Mh]  3Mh/3Sh,  (6) 
N 

where 

8hh'  is  1  if  h=h',  0  otherwise.  (7) 

Parameter  t  is  the  time  of  the  internal  dynamics  of  the 
MF  system.  A  theorem  was  prooved  that  equations  (5) 
through  (7)  define  a  convergent  dynamical  system  of  MF  with 
stationary  states  defined  by  maxjgjjjL.  It  follows  that  the 

stationary  states  of  an  MF  system  give  the  maximum 
similarity  solution  of  the  category  adaptation  and  recognition 
problem.  When  likelihood  is  used  as  a  similarity,  the 
stationary  values  of  parameters  {Sh}  are  asymptotically 
unbiased  and  efficient  estimates  of  these  parameters  [Cramer, 
1963].  A  computational  complexity  of  the  MF  method  is 
linear  in  N, 

complexity(MF)  =  C2H-N,  (8) 

where  C2  is  defined  by  the  relaxation  time  of  the  MF  system 
and  is  likely  to  be  independent  of  N  (or  depends  weekly).  In 
this  way  the  MF  system  solves  the  problem  of  the 
exponential  complexity  associated  with  traditional  methods. 
Fuzzy  class  memberships  f(h  I  n)  define  a  fuzzy  partition  of  set 
N,  which  can  be  used  to  define  a  non-fuzzy  partition  of  the 
type  of  eq.(4):  Vn,  max^  f(h  I  n)  =>  n  e  Nh- 

MFT  describes  an  intelligent  system  composed  of 
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multiple  adaptive  intelligent  agents-symbols:  each  category- 
model-symbol  is  "dormant"  until  activated  by  a  high 
similarity  value.  Every  piece  of  data  may  activate  several 
categories-symbols,  which  "compete"  with  each  other,  while 
adapting  to  the  new  piece  of  data.  Adaptation  and  other 
actions  by  activated  symbols  are  largely  independent  from 
most  other  symbols,  except  for  those  competing  with  each 
other.  Most  important  actions  by  activated  symbols  are 
transmission  of  messages  and  sustaining  the  loop  of 
adaptation  of  their  own  parameters.  Evolutionary  computation 
is  naturally  incorporated  within  MFT:  transmitted  messages, 
including  category  parameters  and  other  aspects  of  category- 
model-symbols  are  strings-objects  of  evolutionary 
computations  and  are  employed  by  modeling  field  systems  for 
adaptation  of  its  structural  components  that  could  not  be 
incorporated  in  a  continuous  fashion  and  are  not  the  subjects 
of  adaptive  MFT  symbol-loops.  The  overall  architecture  of 
MFT  categories-symbols  can  combine  heterarchical  and 
hierarchical  organization. 


3.    QUANTUM  MODELING  FIELD  THEORY 
(QMFT) 

It  is  possible  that  quantum  processes  play  a  role  in 
neuronal  adaptation  [Hameroff  &  Watt,  1982].  Therefore, 
apart  from  implementing  MF  as  a  computer  algorithm  it  is 
interesting  to  consider  a  possibility  of  realization  of  MF  as  a 
quantum  system,  QMF.  Quantum  computing  received 
significant  attention  as  a  potential  method  of  breaking  through 
the  limitations  of  classical  computing  paradigms  since 
Feynman  [1982,  1986]  draw  the  attention  to  this  area  of 
research.  The  following  two  limitations  of  classical 
computational  systems  are  expected  to  be  surpassed.  First, 
classical  systems  dissipate  a  finite  amount  of  energy  (~kT)  per 
1  bit  for  every  operation.  Second,  a  number  of  important 
problems  in  classical  computational  intelligence,  in  the 
number  theory,  and  in  other  fields  are  very  hard  in  that  their 
solutions  require  an  exponentially  large  amount  of 
computational  steps  as  a  function  of  the  problem  complexity. 
(While  the  MF  discussed  in  the  previous  section  addresses  the 
exponential  computational  complexity  of  the  symbol 
adaptation,  still  quantum  algorithms  promise  tremendous 
advantage  in  computing  power).  Quantum  computing  is 
expected  to  surpass  classical  computing  for  the  following  rea- 
sons. First,  quantum  computation  proceeds  without  energy 
dissipation  until  the  process  of  quantum  measurement,  which 
potentially  can  be  postponed  until  the  end  of  the  computation 
process.  Second,  a  quantum  system  can  exist  in  a 
superposition  of  multiple  states,  so  that  multiple  compu- 
tational paths  (including  all  possible  combinations  of  path 
segments)  can  potentially  be  performed  in  parallel,  within  a 
single  process  of  quantum  interference  between  the  quantum 
system  states.  In  QMF,  an  internal  dynamics  of  the  symbol 
adaptation  occurs  as  a  process  of  quantum  interference  (in  place 
of  eqs.(4,  7)),  and  a  partition  similar  to  eq.(2)  is  obtained  in 
the  process  of  quantum  measurement.  Let  us  outline  the  main 
characteristics  of  QMFT. 


QMFT  describes  a  system  which  is  characterized  by 
quantum  states  I  h  >  and  which  interacts  with  the  external 
world  characterized  by  quantum  states  I  n  >.  The  entire  system 
including  the  QMF  system  and  the  external  world,  in  the 
general  case,  is  described  by  a  quantum  state  which  is  a 
superposition  of  states 
(  I  h  >  I  n  >  ), 

H 

^(t)=     I       I    Cnn  I  h  >  I  n  >.  (9) 
N*  h=l 

Here  integration  over  n  €  N*  includes  spatial  (and  possibly 
other  coordinates  of  the  external  world)  but  excludes  time,  t. 
Because  of  the  probabilistic  nature  of  quantum  theory,  we 
consider  similarities  given  by  likelihoods,  and  fuzzy  class 
memberships  being  probabilities.  Quantum  amplitudes  Cnn 

are  related  to  the  probabilities  defined  in  eq.(4)  according  to  the 
rules  of  quantum  theory 

f(h  I  n)  =  I  Chn  I  2  (10) 

Generally,  the  QMF  system  is  in  a  mixed  state  and  is 
described  by  the  density  matrix  [Neumann,  1910;  Sakurai, 
1985], 

Phl,h2(t)  =  1      Chl,  nC*h2,  n  ,  (ID 
N* 

or,  equivalently,  by  the  density  operator 
H 

p(t)  =       I   Ihl>phi,h2<h2l.  (12) 
hl;h2=l 

Equations  of  motion  are  given  by  a  Hamiltonian, 
therefore  the  next  step  is  to  define  the  Hamiltonian  in  such  a 
way  that  the  dynamics  of  a  QMF  system  would  lead  to  a 
solution  of  the  considered  problem  of  the  dynamic  symbol 
adaptation.  Two  types  of  quantum  systems  are  considered 
below,  first,  a  nonequilibrium  quantum  statistical  system 
evolving  to  Gibbs  distribution  and  second,  deterministic 
Hamiltonian  quantum  dynamical  system. 

3. 1  Gibbs  Quantum  Modeling  Field  System 

Consider  the  conditional  similarities  to  be  given  by 
conditional  probability  distribution  functions,  or  conditional 
likelihoods.  Let  us  define  the  Hamiltonian  so  that  the  QMF 
dynamics  would  lead  to  a  Gibbs  distribution  with  probabilities 
defined  by  the  similarity  eq.(l).  Following  well  known 
principles  of  the  quantum  statistical  physics  [Feynman,  1972; 
Zubarev,  1971],  let  us  define  the  Hamiltonian  through  its 
relationship  to  the  pdf.  The  dynamical  variables  of  this 
system  are  unknown  parameters  {Sh},  while  the  data  values 
X(n),  n  =  1,...  N,  are  fixed,  therefore  the  Hamiltonian, 
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H  =  -lnpdf({Sh}  IX(1),...X(N)).  (13) 
Using  Bayes'  theorem, 

pdf({Sh}l{X(n)})=  pdf({X(n)}l{Sh})pdf({Sh})/pdf({X(n)}).  (14) 

Here  pdf(  {Sh}  )  can  be  considered  constant  in  absence  of  prior 
information  concerning  values  of  these  parameters.  In  this 
case,  pdf(  {X(n)}  )  is  also  constant,  because  in  absence  of  a 
priori  values  of  these  parameters  it  has  to  be  invariant.  Thus 
the  Hamiltonian  can  be  written  as 

H  =  -  In  pdf(  {X(n)}  I  {Sh}  )  +  const.  (15) 

Comparing  eq.(15)  with  eq.(l),  we  conclude  that  in  absence  of 
a  priori  information  on  the  parameter  values,  the  Hamiltonian 
is  local  and  the  Hamiltonian  density  !H  can  be  introduced 

H  =  -  In  (pdf(X(n))),  H  =  j       H  +  const.  (16) 

N(t) 

Here  N(t)  stands  for  a  set  of  observations  available  at  time  t. 
This  underlines  the  fact  that  the  Hamiltonian  is  defined  using 
past  data  only  (in  case  when  n  includes  time).  It  is  a 
nontrivial  fact  that  availability  of  a  priori  values  for  the 
parameters  may  result  in  a  non-local  Hamiltonian. 

The  Hamiltonian  eq.(16)  defines  the  dynamics  of  the 
density  operator  [Feynman,  1972], 

p  (t)  =  exp  (-  i  J  H  )  p  (0)  exp  ( i  \  H).  (17) 
t  t 

In  order  to  complete  a  correspondence  between  Gibbs  QMF 
(GQMF)  and  classical  MF,  we  define  an  operator  of  internal 
model  parameters  S  as  follows 

S  =     X     I  h  >  Sh  <  h  I.  (18) 
heH 

Thus,  the  partition  of  the  type  given  by  eq.(4)  and  the  values 
of  internal  model  parameters  {Sh}  at  time  t  can  be  obtained  in 
the  process  of  the  quantum  measurement 

f(h  I  n)  =  Tr[  I  h  >  <  h  I  p(t)  ],  Sh(t)  =  Tr[  S  p(t)  ],  (19) 


An  initial  state  of  a  GQMF  system  is  specified  according 
to  the  initial  state  of  the  MF:  one  can  choose  initial  values  of 
Sh  based  on  a  priori  phenomenological  considerations,  this 
leads  to  initial  probabilities  f(h  I  n)  defined  according  to  eq.(4), 
and  to  initial  values  of  the  density  matrix  p(0)  defined 
according  to  eq.(17),  (18), 

Phi  h2(0)  =  J     e-i<t>(hlln)  +  i<t,(h2ln)[f(hlln)f(h2ln)]1/2  (20) 
N* 

The  phases  (j)  in  this  expression  are  left  undefined  and  can  be 


chosen  to  suit  concrete  problems  at  hand.  One  way  to  avoid  a 
need  to  choose  phases  is  to  use  an  alternative  initialization 
procedure:  define  a  non-fuzzy  partition  according  to  eq.(2), 
leading  to  a  diagonal  density  matrix,  and  compute  initial 
values  of  the  model  parameters  Sh  from  eq.(6). 

In  a  GQMF  system  described  above,  the  finite  temperature 
of  the  system  (and  therefore,  the  finite  accuracy  of  the 
computation)  does  not  interfere  with  computations,  but  is  a 
part  of  the  system  dynamics.  On  the  one  hand  this  is  a  highly 
desirable  property  for  any  practical  implementation  of  a 
quantum  computing  system.  On  the  other,  interaction  with  a 
thermal  reservoir  leads  to  irreversible  energy  dissipating 
processes  involving  quantum  measurements.  A  desirable 
compromise  is  to  reduce  interactions  with  a  thermal  reservoir 
to  a  relatively  rare  occasion,  which  will  ensure  Gibbs 
distribution  and  will  correct  accumulating  phase  errors,  while 
in  between  these  interactions  a  GQMF  system  will  evolve 
according  to  Schroedinger  equation  (17)  without  energy 
dissipation. 

3.2  Hamiltonian  Quantum  Modeling  Field  System. 

In  this  alternative  approach,  let  us  define  the  Hamiltonian 
so  that  the  Hamiltonian  QMF  (HQMF)  dynamics  leads  to  the 
symbol  adaptation.  We  will  do  this  for  a  simplified  case  of 
MF,  when  the  conditional  pdfs  defining  similarities  are 
Gaussians  with  unit  covariances.  Let  us  emphasize,  that  this 
Gaussian-symbol  model  leads  to  the  Gaussian  mixture 
distributions  for  classes  described  by  more  than  one  symbol, 
which  can  model  probability  densities  of  any  shape,  not  only 
Gaussian.  In  this  case  symbol-model  parameters  are  the 
Gaussian  means,  Mh-  The  ML  estimation  equations  for  the 
parameters  Mh  can  be  written  as 

Mh  =    I  f(h  I  n)  X(n)  /   £  f(h  I  n),  (21) 
n  n 

where  the  denominator  has  a  meaning  of  the  average  number 
of  observations  described  by  symbol  h, 

Nh  =    I  f(h  I  n).  (22) 
n 

Also,  in  place  of  eq.(18)  we  have 

M  =     X    I  h  >  Mh  <  h  I.  (23) 
heH 

An  internal  HQMF  encoding  I  x(n)  >  of  the  external  patterns 
X(n)  (equivalently,  of  the  states  I  n  >  of  the  external  world)  is 
defined  as, 

H 

I  x(n)  >  =  <  n  I  «F(t)  >  =     I    Chn  I  h  >.  (24) 

h=l 

This  encoding  follows  from  eq.(9),  it  is  a  consequence  of 
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interaction  between  the  external  world  and  the  QMF  system. 
For  a  HQMF  system,  this  interaction  should  be  defined  so  that 
a  parametric  shape  of  the  probabilities  in  eq.(10)  corresponds 
to  that  of  the  classical  MF  system,  eqs.(4)  [Perlovsky,  1996]. 
Note  that  the  encoding  states  defined  in  eq.(24),  in  general,  do 
not  form  an  orthogonal  set.  Using  these  states,  the  density 
operator  can  be  written  as 

p(t)  =    I   I  x(n)  >  <  x(n)  I.  (25) 
n 

Let  us  also  introduce  an  observation  operator  X  acting  on  the 
encoding  QMF  states, 

X  =     I   I  x(n)  >  X(n)  <  x(n)  I.  (26) 
n=l 

Consider  eigenstates  I  A.  >  of  this  operator, 

X\X>  =   X\X>,   or   <X\X\X>  =  X.  (27) 

Substituting  eq.(26)  into  eq.(27), 

X  =    X   I  <  X  I  x(n)  >  I2  X(n).  (28) 
n=l 

Comparing  this  to  eqs.(10),  (21),  (22),  and  (23),  we  see  that  X 
is  identified  with  Mh-Nh,  and  I  X  >  is  identified  with  I  h  >. 
Introducing  an  operator 

M  =     X    I  h  >  Mh-Nh  <  h  I,  (29) 
heH 

we  have  the  following  identity  for  the  quantum  operators 

M  =  X.  (30) 

This  identity  should  be  attained  in  the  result  of  an  internal 
QMF  dynamics.  States  I  h  >,  which  are  defined  as  the 
eigenstates  of  the  operator  9A.  should  evolve  (according  to 
Schroedinger  equation)  into  the  eigenstates  of  X,  and  the 
considered  dynamic  symbol-process,  is  equivalent  to  the 
process  of  finding  eigenstates  of  the  operator  X.  A  number  of 
algorithms  exist  that  can  be  used  for  this  purpose  [Brockett, 
1991;  Oja,  1992;  Xu  &  Yuille,  1995].  These  algorithms 
utilize  unitary  transformations  and  can  serve  as  a  basis  for  the 
design  of  a  quantum  system.  A  Hamiltonian  H  can  be  defined 
as 

H(t)  =  i  [M,X].  (31) 

According  to  [Brockett,  1991],  this  Hamiltonian  defines  a 
dynamics  that  evolve  eigenstates  I  h  >  of  an  operator  M  into 
the  eigenstates  of  the  operator  X.  It  might  be  noted  that  a 
dynamical  diagonalization  of  an  operator  naturally  occurs  in 
many  quantum  systems,  therefore,  a  choice  of  a  specific 
physical  realization  will  determine  a  specific  realization  of  the 


Hamiltonian. 

If  X(n)  are  vector  quantities  (as  they  usually  are),  then 
lx(n)>  are  defined  with  a  corresponding  vector-index,  so  that  X 
is  a  vector-operator  and  various  components  of  X  operate  on 
the  corresponding  components  of  I  x(n)  >.  Thus  components 
of  X  commute  (and  similarly,  components  of  Ovt commute).  If 
values  of  Nh  are  known,  then  Mh  can  be  directly  obtained 
from  M.  When  Nh  values  are  not  known  a  priori,  eq.(19)  can 
not  serve  as  a  definition  of  the  operator  N,  and  Nh  have  to  be 
obtained  similarly  to  Mh,  in  the  process  of  internal  HQMF 
dynamics.  By  comparing  eqs.(10),  (12),  (22),  (24),  and  (25) 
the  N  operator  is  identified  with  the  density  operator.  Its 
eigenstates  are  different  from  I  h  >,  N  and  M  do  not  commute, 
and  expected  values  of  N  in  ^-eigenstates  Ih  >  are  given  by 
the  diagonal  elements  of  the  density  matrix  [Garvin  & 
Perlovsky,  1995], 

Nh  =  <  h  I  p  I  h  >.  (32) 

The  HQMF  defined  above  evolves  according  to 
Schroedinger  (Hamiltonian)  dynamics  without  energy 
dissipation.  I  would  also  note  that  this  algorithm  does  not 
require  an  exponential  number  of  interfering  quantum  states,  a 
"speed-up"  relative  to  the  classical  computation  occurs  in  the 
result  of  interference  between  a  number  of  states,  which  is 
only  a  linear  function  of  the  complexity  of  the  system.  It 
seems,  however,  that  a  required  coherency  (accuracy  of 
computation)  is  a  constant  number  that  does  not  grow  with 
the  complexity  of  the  problem.  This  is  because,  only  few 
amplitudes  <  h  I  x(n)  >  (few  h  for  each  n,  and  few  n  for  each 
h)  interfere.  In  addition,  the  HQMF  system  has  an  advantage 
of  a  relatively  simple  formulation,  providing  a  foundation  for 
the  next  step  of  a  physical  realization  of  a  quantum  computer. 

4.  DISCUSSION 

The  first  dynamic  conception  of  mind  belongs  to 
Aristotle.  His  theory  of  Forms  describes  learning  as  a 
realization  of  an  a  priori  Form-as-potentiality.  The  conception 
of  mind  as  a  dynamic  system  based  on  a  priori  categories  was 
further  developed  by  Kant  (1781,  1788,  1790).  Kant  described 
working  of  the  mind  as  a  triadic  process  of  Understanding- 
Judgment-Reason.  MFT  gives  the  mathematical  description 
of  this  process  (Perlovsky,  1998).  Understanding  is  given  by 
a  priori  models,  Judgment  is  given  by  a  similarity  measure, 
and  Reason  is  given  by  the  adaptation  mechanism.  MFT 
provides  a  mathematical  description  of  both,  Kantian  theory  of 
mind  and  the  process  of  semiosis.  This  establishes  the 
relationship  between  the  two  profound  contributions  to  the 
analysis  of  mind,  Kantian  theory  and  semiotics.  The  dynamic 
symbol-formation  loop  describes  Kantian  triadic-mind  process. 
In  [Dmitriev,  Perlovsky,  1996],  these  loops  were  called  the 
vortexes  of  thought.  The  relationships  among  the  semiotical 
description  of  the  symbol-semiosis,  the  Kant  triadic  process, 
and  MFT  mathematical  agent-vortexes  will  be  illustrated. 
Sign- vehicles  are  the  internal  model  codes,  objects  or  designata 
are  the  incoming  data,  the  interpretant  (that  recognizes  sign- 
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models  as  corresponding  to  patterns  in  input  data  and  generates 
actions)  is  a  combination  of  Judgment+Reason  (that  is, 
similarity+  adaptation  and  other  actions  of  the  activated 
symbol  within  MFT).  Syntactics  is  a  logic  governing 
Understanding  that  is  relationships  among  internal  models. 
Semantic  is  Judgment  or  similarity  measures  relating  input 
data  and  models.  And,  pragmatics  includes  the  effects  of 
models  on  similarity  and  adaptation  actions  of  Reason, 
modifying  the  models. 
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ABSTRACT 

Semiotic  compounds  are  mixtures  of  probability  density 
functions.  They  are  used  to  formulate  joint  estimation 
problems  and  methods  that  transcend  the  boundaries  of  the 
traditional  detection,  classification,  and  state  estimation 
problem  hierarchy.  They  are  non-hierarchical  statistical 
models  because  they  are  marginal  densities  over  alternative 
hypotheses,  or  choices.  Several  novel  examples  of  semiotic 
compounds  are  discussed  in  this  paper.  A  common  thread 
running  through  these  examples  is  that  useful  semiotic 
compounds  often  have  mixture  components  that  are  related  by 
specific  invariance  transformations,  that  is,  the  components 
are  invariant  to  characteristic  transformations  that  are 
determined  by  the  application. 
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semiotic  compounds,  invariance  properties,  invariance, 
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1.  INTRODUCTION 

Traditional  techniques  for  detection,  classification,  and  state 
estimation  (or  localization)  are  typically  structured 
hierarchically,  that  is,  detection  is  undertaken  first,  then 
classification  is  performed  as  a  post-detection  problem,  and 
finally  state  estimation  is  conditioned  on  the  outcomes  of  the 
detection  and  classification  steps.  This  conditioning  hierarchy 
is  founded  on  natural  and  clear  distinctions  between  important 
statistical  issues  inherent  in  many  problems  and  applications; 
however,  hierarchical  conditioning  may  lead  to  performance 
that  is  inferior  to  joint  non-hierarchical  methods  when 
numerous  alternative  hypotheses  occur  within  the  problem. 

Semiotic  compounds  are  mixtures  of  probability  density 
functions  (PDFs).  Each  of  the  alternative  hypotheses 
contributes  one  component  to  the  mixture.  Semiotic 
compounds  are  presented  in  this  paper  as  the  constituent 
models  of  statistical  approaches  to  joint  detection, 
classification,  and  state  estimation  (or  localization) 
capabilities.  Joint  methods  and  algorithms  based  on  semiotic 
compounds  defy  easy  categorization  within  the  traditional 
hierarchical  conditioning  structure  because  of  their  use  of 


marginalization  and  because  of  their  avoidance  of  conditioning 
on  specific  hypotheses. 

Semiotic  compounds  vary  considerably  depending  on  the  list 
of  alternatives  over  which  the  particular  marginalization,  or 
summation,  is  performed.  A  semiotic  compound  takes  the 
form  of  a  finite  sum  of  PDF's  if  the  alternatives  are  finite  in 
number,  and  it  takes  the  form  of  an  integral  if  the  alternatives 
are  characterized  by  one  or  more  real  valued  parameters.  In  the 
former  case  the  semiotic  compound  is  called  discrete,  and  in 
the  latter  it  is  called  continuous.  Continuous  semiotic 
compounds  are  not  widely  used,  perhaps  in  part  because  of  the 
(incorrect)  perception  that  they  are  merely  continuous 
extensions  of  the  more  widely  used  discrete  mixtures.  Infinite 
discrete  semiotic  compounds  are  also  potentially  interesting, 
but  no  examples  of  this  kind  are  presented  here. 

Semiotic  compounds  also  vary  in  the  nature  of  the  component 
PDFs  characterizing  the  alternative  hypotheses.  In  this  paper 
it  is  assumed  that  the  component  PDF's  have  the  same 
parametric  form,  but  different  values  of  the  parameters;  that  is, 
they  belong  to  the  same  general  parametric  family  Discrete 
Gaussian  semiotic  compounds  are  easily  the  most  widely  used 
in  applications  to  date.  Mixtures  of  disparate  types  of  PDFs 
are  possible,  but  are  not  considered  in  this  paper. 

An  important  contribution  of  this  paper  lies  in  the  importance 
it  ascribes  to  the  concept  of  component  invariance  under 
transformation.  Recognizing  the  appropriate  invariant 
transformation  can  give  considerable  insight  into  the  problem 
under  study.  Invariance  is  not  discussed  in  generality;  rather, 
it  is  discussed  in  the  context  of  the  following  specific 
problems  (which  may  be  read  independendy): 

(a)  In  Section  2,  a  generalized  likelihood  ratio  test  (GLRT) 

for  detecting  transient  and  spread  spectrum  signals  in 
noise  is  treated  using  a  discrete  exponential  semiotic 
compound;  estimates  of  both  signal-to-noise  (SNR) 
level  and  bandwidth  fall  out  as  by-products  of  the 
GLRT  procedure. 

(b)  In  Section  3,  three  discrete  Gaussian  semiotic 
compounds  are  discussed  for  estimating  a  PDF  from 
sample  data.  Although  two  of  these  compounds  are 
well  known,  the  thud  is  new  and  potentially  very 
useful  because  of  its  numerical  and  statistical 
robustness. 
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(c)  In  Section  4,  a  continuous  semiotic  compound  is 
applied  to  so-called  bearings-only  target  motion 
analysis  (TMA)  state  estimation  problem. 
Concluding  remarks  are  given  in  Section  5. 


2.    TRANSIENT  DETECTION 

Difficulties  arise  in  GLRT  detection  problems  where  one  or 
more  of  the  signal  parameters  requires  an  enumeration  that  is 
computationally  intractable.  In  transient  signal  detection  the 
frequency  characteristics  of  the  signal  are  typically  unknown, 
so  even  if  aggregate  bandwidth  is  assumed,  the  estimation 
problem  intrinsic  to  the  GLRT  requires  an  enumeration  of  all 
possible  sets  of  signal  locations  within  the  monitored  band. 

A  prior  distribution  is  imposed  on  the  portion  of  the  signal 
parameter  space  that  usually  requires  intractable  enumeration. 
By  replacing  enumeration  with  the  problem  of  estimating  the 
hyperparameters  of  the  model  jointly  with  other  signal 
parameter,  a  new  formulation  of  the  problem  that  avoids 
enumeration  and  is  computationally  feasible  is  possible.  The 
GLRT  approach  is  not  changed  by  this  model  —  what  is 
changed  is  the  signal  modal. 

An  observation  vector  X  of  length  N  is  obtained  by  taking  the 
magnitude  squared  of  the  output  of  an  FFT  whose  input  is  a 
broadband  Gaussian  time  series.  Elements  of  X  are  therefore 
independent  and  exponentially  distributed.  It  is  assumed  that 
the  exponential  distributions  have  one  of  two  possible  means: 
the  larger  mean,  denoted  by  a,  is  the  expected  signal-plus-noise 
power;  the  smaller  mean,  denoted  by  b,  is  the  expected  noise- 
only  power.  Every  element  of  X  is  assumed  to  contain  a 
sample  obtained  under  either  {H:  noise  only}  or  {K:  signal 
plus  noise}.  Thus,  two  mutually  exclusive  and  exhaustive 
alternative  hypotheses  are  assumed. 

A  Bernoulli  random  variate  with  outcomes  {H,  K}  having 
probabilities  {1-q,  q},  respectively,  is  used  to  model  the 
aggregate  signal  bandwidth,  that  is,  the  number  of  elements  of 
X  that  contain  signal.  In  this  model,  N  samples  of  the 
Bernoulli  variate  are  generated,  one  for  each  element  of  X. 
These  outcomes  comprise  the  missing  data  in  the  sense  of  the 
method  of  expectation-maximization  (EM).  Samples  of  the 
FFT  outputs  are  then  drawn  from  the  exponential  density 
determined  by  the  Bernoulli  outcomes.  Thus,  the  expected 
signal  bandwidth  is  equal  to  qN. 

The  parameters  {q,a,b}  are  unknown  and  must  be  estimated 
from  the  data  vector  X  using  maximum  likelihood  (ML). 
Marginalizing  over  the  Bernoulli  random  deviates  gives  the 
likelihood  function  L(q,a,b)  as  a  product  over  the  data  of  a  two 
component  mixture  of  exponential  densities, 


n 

i=i 


(1  -  q^Expl-X,  I  b]  +  q-Expi-X,  I  a] 
b  a 


Given  current  estimates  {q,a,b}  for  the  parameters,  for 
i=l,...,N  compute  the  weights 


wi(H)  = 


(l-q)jExp[-Xt/b] 


(1  -  q^Expl-X,  I  b]  +  q-Expl-X,  I  a] 
b  a 


q—Exp[-Xi  la] 
a 


(1  -  q)\Eq>[-Xt  I  b^q-Expl-X,  I  a] 
b  a 


Updated  estimates  of  noise  power  and  signal-plus-noise  power 
are  then  given  by  the  convex  combinations 

and 


■  _Er_,>"iWX; 


a  = 


The  expected  signal  bandwidth  is  given  by 


qN  =  j^wi(K). 


These  equations  complete  the  algorithm  statement. 

The  GLRT  modeling  philosophy  requires  that  the  final 
estimates  of  {q,a,b}  be  substituted  into  a  likelihood  ratio  for 
testing  H  against  K.  Further  details  are  given  in  [1],  where 
the  GLRT  is  shown  to  be  comparable  to  the  best  of  a  family 
of  nearly  optimal  power-law  detectors  developed  for  this 
problem. 

Extension  of  the  Bernoulli  signal  model  to  p>2  power  levels 
(i.e.,  outcomes)  requires  semiotic  mixtures  of  p  exponential 
PDFs;  applying  the  method  of  EM  leads  to  an  ML  algorithm 
that  is  very  similar  to  that  sketched  above.  Further  details  are 
given  in  [1]. 

The  importance  of  this  example  is  two-fold.  Firstly,  it  shows 
that  the  detection  statistic  requires  joint  estimation  of  signal 
and  noise  parameters  from  the  data  vector  and  that  signal 
bandwidth  is  estimated  before  the  signal  detection  decision  is 
completed.  Thus,  the  GLRT  does  not  fit  the  traditional 
statistical  hierarchy  in  this  problem.  Secondly,  the  invariant 
transformation  of  the  exponential  PDFs  used  in  this  problem 
is  scalar  multiplication.  The  difference  between  the  noise  and 
signal  models  is  essentially  an  arbitrary  distinction  based  on 
their  power  levels. 


Straightforward  application  of  the  method  of  EM  using  the 
Bernoulli  signal  model  yields  the  following  ML  algorithm: 
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3.   DENSITY  ESTIMATION 

Discrete  Gaussian  semiotic  compounds  are  widely  used  to 
approximate  an  unknown  PDF  from  which  a  set  of 
statistically  independent  and  identically  distributed  (i.i.d.) 
sample  data  have  been  drawn.  ML  algorithms  are  easily 
derived  via  the  method  of  EM  for  estimating  the  parameters  of 
two  classes  of  Gaussian  mixture:  homoscedastic  ("same 
scatter")  mixtures,  which  employ  the  same  covariance  matrix 
in  every  component;  and  heteroscedastic  ("different  scatter") 
mixtures,  which  use  different  covariance  matrices  in  every 
component.  The  former  are  robust  numerically,  but  may 
sometimes  require  many  components  to  achieve  good  PDF 
approximation  accuracy.  Heteroscedastic  mixtures  may  require 
fewer  components  for  the  same  PDF  accuracy  as 
homoscedastic  mixtures,  but  they  may  be  numerically 
unstable  in  that  one  or  more  of  the  covariance  matrices  may 
become  singular  during  iteration.  Deficiencies  of 
heteroscedastic  mixtures  become  very  severe  as  the 
dimensionality  of  the  sample  data  PDF  increases.  Further 
discussion  of  these  mixture  classes  is  given  in  [2]  and  in  the 
references  therein. 

In  this  paper,  covariance  restrictions  are  seen  to  be  special 
cases  of  the  general  principle  of  component  in  variance  under 
transformation.  The  transformation  is  selected  to  characterize 
the  common  semiotic  content  of  the  fundamental  symbol 
under  study;  that  is,  components  that  are  identical  under  the 
specified  transformation  comprise  the  components  of  the 
mixture,  and  each  component  is  a  legitimate  expression  of  the 
same  symbolic  content.  Thus,  the  mixture  characterizes  the 
symbol. 

The  invariance  transformation  of  homoscedastic  mixtures  is 
rigid  translation,  because  components  differ  only  in  their 
means.  For  heteroscedastic  mixtures,  the  components  may  be 
general  Gaussian  PDFs,  so  the  invariance  transformation  is 
the  combined  transformations  of  translation,  rotation,  and 
dilation.  A  rotation  and  dilation  will  transform  one 
covariance  matrix  into  another,  as  is  easily  seen  from  the 
singular  value  decomposition  (SVD). 

It  is  clear  from  the  above  paragraph  that  an  important  class  of 
invariance  transformations  has  been  omitted,  namely,  the  class 
of  transformation  comprising  translation  and  rotation  only. 
Gaussian  semiotic  compounds  with  this  invariance 
transformation  are  called  herein  a  strophoscedastic  ("twisted 
scatter")  compounds,  and  they  were  first  discussed  in  [3]. 
Surprisingly,  the  method  of  EM  yields  an  explicit  ML 
estimation  algorithm,  that  is,  the  M-step  of  the  method  of  EM 
can  be  solved  explicitly,  just  as  it  can  in  the  homoscedastic 
and  heteroscedastic  cases.  Although  the  derivation  is 
significantly  more  difficult,  the  resulting  algorithm  is 
intuitively  reasonable  in  that  it  differs  from  the  algorithms  for 
the  other  two  mixtures  simply  in  the  amount  of  averaging  that 
is  required. 


Let  p(x)  denote  the  true,  but  unknown,  PDF  of  the  sample 
data,  and  let  d  >  1  denote  the  dimension  of  the  samples.  The 
density  p(x)  is  approximated  by  a  strophoscedastic  mixture, 
denoted  by  q(x),  having  N  components.  N  is  assumed  given; 
its  selection  is  a  model  order  selection  problem  and  is  outside 
the  scope  of  this  paper.  The  parameters  defining  the 
strophoscedastic  approximation 

p(x)  =  q(x)  =  E  n„  K  (x  |  /;„,  Q'n  A  Qn)  (i) 

n=l 

are  given  as  follows: 
(i)  Mixing  proportions  K  =  {ni,---,n^}   such  that 
7Cn  >  0  forn  =  1, ....  Nand  7Ti+  --+7tN  =1, 

(ii)  Mean  vectors  \i  =  {fliy  -,Hn}  such  that  nn  €  Rd  for 
n  =  1,  ....  N, 

(Hi)    Positive    definite    diagonal    kernel  matrix 
A  =  <Duy[ah  •••,ad]eRdxd,  a,  >  •••  >ad>0, 
(iv)     Orthogonal    matrices     Q,  =  {Qi,  Qn)> 

Qn  e  Rdxd ,  Q'n  Qn  =  Qn  Q'n  =  I  for  n  =  1, N. 
The  parameters  in  the  list  (i)  -  (iv)  are  estimated  from  the 
sample  data  using  the  ML  method.  Let  K  be  the  number  of 

i.i.d.  samples  in  X  =  {x,,x2,---,xK},  where  x^  ERd  is 
drawn  from  the  distribution  p(x).  Using  the  approximation 

(1)  and  the  independence  of  the  samples  X  gives  the  posterior 
likelihood  function 

£(X|tc,h,A,Q,)  =  ft  q(xk|  n,\L,A,(l).  (2) 

k=l 

Applying  the  method  of  EM  requires  defining  random 
variables  to  characterize  the  so-called  missing  data.  The 
missing  data  are  easily  understood  from  a  simulation  of  the 
mixture  q(x):  Firstly,  the  discrete  PDF  defined  on  the  set 
{1,...,N}  by  the  probability  vector  k  is  sampled  to  obtain  a 
particular  Gaussian  component  of  the  mixture.  Secondly,  this 
Gaussian  component  is  sampled  to  produce  a  vector  in  the 
sample  set  X.  Every  sample  in  X  is  modeled  as  having 
arisen  in  this  fashion,  so  the  missing  data  are  the  discrete 
outcomes  of  the  first  step  of  the  hypothetical  simulation. 
Using  the  missing  data  enables  the  method  of  EM  to  be 
applied  to  derive  a  ML  algorithm  that  is  guaranteed  by  general 
properties  of  the  method  to  generate  a  sequence  of  parameter 
estimates  that  monotonically  increase  the  likelihood  function 

(2)  .  Thus,  if  the  likelihood  function  is  uniformly  bounded 
above  for  any  choice  of  parameter  set,  the  algorithm  converges 
to  either  a  local  ML  parameter  estimate  or  a  stationary  point 
of  (2). 

For  m  =  0,  let  7C(m),  n(m),  A(m),  and  Q.(m)  be  initial 

estimates  for  the  parameters  defining  the  strophoscedastic 
mixture  that  satisfy  the  conditions  imposed  in  (i)  -  (iv)  above. 
For  integers  m  >  0,  updated  estimates  for  iteration  m+1  are 
given  by  the  following  equations:  For  n  =  1, N  and  k  =  1, 
....  K,  compute  the  weights 
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w 


The  expected  number  of  samples  from  component  n, 
conditioned  on  parameter  estimates  at  iteration  m,  is 

K<m+1>  =  £  w(m+1)- 
n  nk 

k=l 

The  updated  mixing  proportions  are  given  by 
7C(m+l)=K(m+l)/K 
n  n 
and  the  updated  mean  vectors  by 

K 


(m+1)  _  l 


si 


K(m+1) 
Kn  k=l 


The  within-component  sample  data  covariance  matrices 
(conditioned  on  parameter  estimates  at  iteration  m)  are 

K 


w 


(m+1) 


(x.-nrx^-nr") 


Compute  the  ordered  SVD's 

s(m+l)  _  |j<m+l)  £(m+l)  ^j(m+l)y  ^ 


n 


n 


n 


n 


where  the  matrices  U^n+1^  are  orthogonal  and  the  diagonal 
elements  of  the  matrices 

I(m+1)  =  2Wa(m+1)  a(m+1)  •••  a(m+1)] 

n  In     '    2n     '     '    dn  1 

are  arranged  in  decreasing  order.  The  updated  orthogonal 
matrices  of  the  strophoscedastic  mixture  are  the  transposes  of 
the  orthogonal  matrices  of  the  ordered  SVD's,  that  is, 

q(m+l)  _  ^m+ljy 

The  kernel  matrix  is  obtained  by  averaging  the  diagonal 
matrices  of  the  SVD's  over  all  mixture  components: 

A(m+1)=   £    ^(m+1)  s(m+l) 


n=l 


n 


n 


This  completes  the  statement  of  the  algorithm.  A  derivation 
of  the  algorithm  is  given  in  [4]. 

ML  algorithms  for  the  three  mixture  classes  differ  only  in 
where  averaging  is  applied.  Homoscedastic  algorithms 
average  the  within-component  sample  data  covariance  matrices 
to  obtain  the  component  covariance  matrix.  Strophoscedastic 
algorithms  average  only  the  singular  values  of  the  with- 
component  matrices.  Heteroscedastic  mixtures  do  not  average 
the  within-component  matrices  at  all.  The  presence  of 
averaging  explains  intuitively  why  homoscedastic  and 
strophoscedastic  ML  mixture  algorithms  are  so  reliable  and 
unlikely  to  encounter  singular  covariance  matrices  during 
iteration. 


4.  TARGET  LOCALIZATION 

Classical  TMA  statistical  models  are  post-detection  models, 
that  is,  they  assume  a  priori  that  the  measurements  Z  belong 
to  a  common  target  having  a  specified  parametric  form  (e.g., 
straight  line  motion).  Post-detection  tracking  implies  that 
measurements  are  independent  if  they  are  conditioned  on  the 
target.  ML  estimators  thus  answer  the  question  "Given  data 
generated  from  a  target  track,  which  parameterized  track  besi 
fits  the  data?"  In  contrast,  the  empirical  maximum 
posteriori  (EMAP)  estimators  discussed  here  differ 
fundamentally  from  classical  post-detection  TMA  because  they 
are  joint  detection-estimation  methods  which  seek  to  answer 
the  alternative  question  "Does  a  target  track  of  the  specified 
parametric  form  fit  the  data?"  A  generalized  likelihood  ratio 
test  (GLRT)  in  which  track  parameters  are  estimated  and 
substituted  into  a  likelihood  ratio  is  the  EMAP  answer  to  the 
question;  however,  it  is  the  estimated  track  —  and  not  the 
GLRT  detector  —  which  is  the  object  of  interest  in  this 
paper. 

The  data  {Z,X°}  =  {(Zn,X°)}^=l  are  assumed  statistically 

independent  because  measurements  are  not  specified  a  priori  to 
belong  to  the  same  track.  Independence  implies  that 


The  data  {(Zn,  X")}^  contribute  independent  probability 
density  assessments  of  "potential"  target  position  that  are  valid 
at  the  measurement  times.  Let  Xa  =  {X"}^=1  denote  so- 
called  "empirical"  random  variables  associated  with  potential 
target  location.  Empirical  random  variables  are  assumed 
independent  when  conditioned  on  corresponding  measurements 
and  sensor  locations;  hence,  the  empirical  target  location  PDF 
is 

N 

^xn|zxo^n'        =  n^xQ|z  X°  ' 

The  empirical  target  likelihood  function  is  evaluated  for 
specified  parametric  target  motion  models,  once  the 

conditional  density  of  X^  is  defined. 

In  the  remainder  of  this  section  the  passive  azimuthal 
bearings-only  TMA  problem  is  considered;  thus,  it  is 
assumed  that  the  target  lies  in  the  x-y  plane,  that  Z„  =  6„ , 

where  6n  is  the  measured  azimuthal  bearing  from  the  sensor 

to  the  target,  and  that  X°  =  (x°,yl)  is  the  sensor  location 

when  the  bearing  6n  is  obtained.  Target  is  unobservable  from 

a  single  bearing  because  range  measurements  are  assumed  to 
be  unavailable  from  the  passive  sensor. 

A  continuous  semiotic  compound,  or  marginal  integral 
representation,  of  the  PDF  of  the  empirical  state  variables  can 
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be  derived  by  introducing  the  dummy  random  variable  rn  to 
model  the  "missing"  sensor  range  measurement  corresponding 
to  6n.  Full  details  of  this  novel  approach  to  target 
observability  difficulties  are  provided  in  [5]  and  are  not 
repeated  here.  Sensor  and  target  characteristics  are  assumed  to 
be  such  that  the  missing  range  variable  lies  in  a  known  finite 

range  interval,  r^a  <rn<  r^. 

Let  X  ={X\,y\,XN,yN)  denote  the endpoint  coordinates  of 
a  target  with  straight  line  motion.  Substituting  this  motion 
model  into  the  integral  representation,  and  substituting  the 
result  into  the  overall  empirical  likelihood  function  gives 

S      '"max  (  j 

pxa]ZX0mx)\z,x°)=  n  J  Exp«  " 


n=l 


2r 


Q  = 


where  the  quadratic  form  Q  of  the  exponential  function  is 
'a(tn  )*,  +  P(t„  )xN  -  x°n  -  r„  cos  6,  n  ' 

a{tn)yx+P(tn)yN -y°„-r nsmdn 

~a{tn  )Xl  +  P(t„  )xN  -  x°  -  rn  cos  d, 

'  _  a(tn  )yl  +  P(t„  )yN  -  y°„  -  r„  sin  d„  _ 
and  the  linear  interpolation  functions  are  given  by 


a(t)= 


and  P(t)  =  — — — 


The  covariance  matrix  of  the  quadratic  form  Q  is  given  by 


cosd„ 

sin0„" 

T 

< 

o" 

cosdn 

sind„ 

-sin0„ 

cosdn 

0 

< 

_-sin0„ 

cosd„ 

2  t\  —2 

where  <7n  is  the  variance  of  the  bearing  an ,  and  Kn  is  the 
variance  of  the  missing  range  variable  rn . 

By  discretizing  the  integrals  and  applying  the  method  of  EM, 
an  EMAP  estimation  algorithm  can  be  derived.  It  can  be 
shown  that  this  algorithm  is  a  sequence  of  iteratively 
reweighted  linear  least  squares  problems.  The  variance  of  the 

missing  range  measurement  rn  is  a  free  variable  that  can  be 

used  to  greatly  accelerate  the  convergence  of  the  EM 
algorithm.  Further  details  will  be  reported  in  [6]. 

The  quadratic  exponent  Q  of  the  Gaussian  component  of  the 
continuous  semiotic  compound  is  the  only  form  satisfying  a 
combination  of  geometrical  and  analytical  properties  natural  to 
the  bearings-only  TMA  problem.  Firstly,  empirical  target 
location  is  assumed  to  be  Gaussian  distributed  when 
conditioned  on  actual  target  location.  This  is  a  Bayesian 
assumption  about  the  empirical  target,  not  the  target. 
Secondly,  the  marginal  density  over  the  empirical  range 
variable  of  the  conditional  empirical  PDF  is  required  to  be 
independent  of  actual  target  range.  This  requirement  implies 


that  the  cross-range  and  down-range  variances  must  grow  as 
the  square  of  target  range.  Thirdly,  the  marginal  density  over 
empirical  range  is  required  to  be  a  function  of  the  difference 
between  the  empirical  bearing  and  the  target  bearing.  This 
requirement  forces  the  axes  of  the  empirical  covariance  matrix 
to  align  with  the  cross-range  and  down-range  directions.  In 
summary,  these  requirements  completely  characterize  the 
Gaussian  component  density  of  the  semiotic  compound,  that 
is,  the  quadratic  form  Q  is  the  unique  form  that  satisfies  these 
requirements.  For  a  detailed  derivation,  see  [5,  Appendix  A]. 

As  is  seen  from  direct  examination  of  Q,  the  components  of 
the  continuous  semiotic  compound  are  invariant  under  the 
combined  transformations  of  translation  down-range  and 
square-law  spreading  of  the  variances  of  its  principle 
components  with  increasing  range.  The  resulting  integral 
representation  of  the  empirical  density  is  remarkable  in  that  it 
is  very  nearly  constant  along  straight  lines  radiating  outward 
from  the  sensor  location,  and  moreover  it  is  very  nearly 
Gaussian  along  circular  cuts  centered  on  the  sensor.  Both 
these  statement  are  valid  within  the  annulus  with  inner  and 
outer  radii  rmin  +  £  and  —  £,  respectively,  where  e  is  a 
small  positive  number. 

Multitarget  applications  of  discrete  semiotic  compounds  are 
also  possible.  In  these  applications  the  missing  data  is  the 
correct  assignment  of  a  measurement  to  a  target.  Just  as  in 
the  examples  above,  the  method  of  EM  can  be  used  to  derive 
ML  estimation  algorithms.  Invariance  transformations  have 
not  yet  been  sought  for  these  problems.  See  [7,  8,  9]  for 
further  details. 


5.    CONCLUDING  REMARKS 

The  common  thread  linking  the  various  examples  is  that  of  an 
invariant  transformation.  However,  the  appropriate  nature  of 
the  transformation  depends  strongly  on  the  given  application. 
It  also  appears  from  these  examples  that  semiotic  mixtures  of 
components  constructed  (i.e.,  derived)  to  be  invariant  under  a 
specified  set  of  transformations  may  possess  remarkable 
properties  that  are  important  for  the  application.  In  any  event, 
more  examples  are  needed  to  illustrate  the  utility  of  semiotic 
compounds  with  invariance  transformations  for  joint  detection, 
classification,  and  localization. 

The  method  of  EM  is  not  the  only  available  ML  estimation 
method,  for  in  any  problem  to  which  EM  is  applicable,  the 
standard  necessary  conditions  may  always  be  derived  and  solved 
directly.  EM  is  merely  a  general  procedure  for  deriving  local 
ML  parameter  estimation  algorithms  in  problems  whose 
likelihood  function  is  modeled  as  a  marginal  PDF  over 
"missing"  information.  Specific  algorithms  derived  via  EM 
vary  considerably  in  nature,  depending  strongly  on  the 
parametric  form  of  the  particular  PDF's  that  are  used.  EM 
algorithms  often  take  huge  steps  toward  the  solution  in  the 
very  first  few  steps  (a  phenomenon  unaccounted  for  in  the 
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convergence  literature),  but  it  remains  a  complaint  that  they 
are  asymptotically  only  linearly  convergent.  For  numerical 
purposes,  a  hybrid  algorithm  seems  potentially  useful: 
initialize  with  EM,  and  finish  with  a  quadratically  convergent 
method  such  as  Newton-Raphson.  Ultimately,  the  real  value 
of  the  method  of  EM  may  reside  more  in  the  way  of  thinking 
and  modeling  that  it  suggests  for  new  and  challenging 
problems  than  in  any  the  resulting  algorithm. 

Implicit  within  the  marginalization  interpretation  of  semiotic 
compounds  is  a  randomization  concept  that  is  essentially 
Bayesian  in  nature.  Consequently,  when  detection, 
classification,  and  estimation  problems  are  undertaken  in 
contexts  (e.g.,  high  SNR)  in  which  decision  errors  are  very 
unlikely,  the  use  of  the  traditional  hierarchical  conditioning 
may  be  satisfactory  for  many  applications.  However,  when 
decision  errors  become  common,  special  decision  support 
systems  (i.e.,  expert  systems)  are  needed  to  protect  against 
them  and  to  accommodate  conflicting  decisions.  Over  the  last 
decade,  Bayesian  networks  (see,  e.g.,  [10])  have  been  proposed 
and  developed  for  reasoning  under  uncertainty  with 
mathematical  guarantees  of  consistency.  In  environments 
where  reasoning  under  uncertainty  is  the  central  and  pivotal 
issue,  semiotic  compounds  are  potentially  useful  models. 
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Abstract.  A  new  type  of  statistical  estimation  principle  is 
suggested.  It  is  based  on  the  Einsteinian  interpretation  of  the 
spectrum  as  a  pdf.  I  discuss  its  relations  to  and  distinctness 
from  the  classical  estimation  principles  of  maximum 
likelihood,  information,  and  entropy.  The  new  principle  can 
be  interpreted  as  a  relaxation  to  equilibrium  of  a  physical 
system  consisting  of  the  "world"  and  the  estimation  system. 
The  new  estimation  principle  has  been  successfully  tested  in  a 
number  of  applications.  Performance  exceeding  classical 
estimation  principles  has  been  indicated. 


1.  INTRODUCTION 

Fundamental  principles  of  statistical  estimation  include 
maximization  of  likelihood,  maximization  of  mutual 
information,  and  related  techniques.  Maximization  of 
likelihood  (ML)  is  a  most  established  principle,  dating  to 
Bayes.  According  to  the  ML,  unknown  parameters  are 
estimated  by  maximizing  a  probability  density  function  (pdf) 
of  the  available  data.  The  ML  has  certain  optimal  properties, 
it  is  asymptotically  unbiased  and  efficient.  The  information  or 
entropy  maximization  has  being  used  since  1950s.  Kullback 

and  Leiblerl,  following  Khinchin,  developed  a  measure  of 
information  distance,  that  was  used  to  develop  Khinchin- 
Kullback-Leibler  estimation  approaches  also  known  as 
maximum  entropy  (ME),  minimum  cross  entropy  (MCE)  and 
minimum  discrimination  information  (MDI)2>3.  The  ME 
philosophy  was  formulated  by  Jaynes^  as:  "maximally 
noncommittal  with  regards  to  missing  information".  A  related 
minimum  cross-entropy  principle^  minimizes  "information 
measure  necessary  to  change  a  prior  pdf  into  estimated  pdf". 
Of  a  particular  interest  is  Shore's  comment^  that  the  ML 
estimation  is  justified  from  the  basic  principles  only  if  a  model 
is  exactly  correct  (for  some  set  of  the  model  parameters), 
while  MCE  does  not  rely  on  this.  Thus,  he  argued  that  MCE  is 
a  more  general  approach  than  ML,  in  that  it  is  justified  even  if 
we  know  that  our  models  are  approximate. 

I  propose  a  new  estimation  principle,  which  is  related  to 
and  distinct  from  the  classical  estimation  principles  discussed 
above.  Following  Einstein  and  Hopf-\  I  consider  a  different 
type  of  pdf  than  usually  considered  in  statistical  estimation. 
Since  the  concept  of  pdf  is  a  basis  for  statistical  estimation, 
likelihood,  and  information  measures,  the  new  principle  leads 
to  different  results  than  the  classical  ones.  A  specific  point  of 
difference  is  the  attribution  of  randomness.  Usually,  in 
statistical  estimation,  randomness  is  attributed  to  measured 
quantities  as  follows.  E.g.,  if  an  image  is  produced  by  the 


light  intensity,  an  intensity  of  the  pixel  is  considered  as  a 
random  quantity  and  pdf  models  are  designed  to  model  this 
randomness.  Or,  if  a  radar  measures  Doppler  spectra,  a  signal 
intensity  or  power  in  a  Doppler  cell  (a  sample)  is  considered  a 
random  quantity,  and  pdf  models  are  constructed  accordingly 
(say,  for  the  spectrum  it  could  be  a  density).  Contrary  to 
this,  Einstein  interpreted  the  spectrum  as  a  pdf  of  photon 
frequency.  That  is,  the  Doppler  cell  frequency  is  to  be 
considered  random,  rather  than  the  intensity  or  power. 

I  develop  the  Einsteinian  concept  further,  and  relate  it  to 
the  type  of  estimation,  which  is  usually  considered  in 
statistical  physics.  Namely,  I  relate  the  new  estimation 
principle  to  the  process  of  relaxation  to  equilibrium  in  a 
physical  ensemble.  The  ML  principle,  when  applied  to 
"Einsteinian"  likelihood,  results  in  the  same  estimation 
equations  as  the  maximum  entropy  principle  used  to  find  the 
equilibrium  of  a  physical  system.  Also,  the  new  estimation 
principle  is  interpreted  as  a  maximization  of  mutual 
information  in  the  model  about  the  data.  The  "Einsteinian" 
likelihood  and  pdf  models  are  applicable  to  signals  or  images 
formed  by  measuring  the  energy  (spectrum)  of 
electromagnetic  or  acoustic  waves,  which  are  widely  used 
types  of  data. 

Many  statistical  spectrum  estimation  methods  are  based 
on  parametric  spectrum  models.  This  permits  one  to  utilize 
information  about  signal  phenomenology  and  enables  one  to 
achieve  a  better  accuracy  than  by  using  non-parametric 
estimation  approaches.  A  number  of  models  have  been 
utilized  for  this  purpose,  suitable  for  broad  categories  of 
signals,  including  autoregressive,  moving  average,  their 
combination,  and  sinusoids  in  noise.  Also,  a  number  of 
parameter  estimation  methods  have  been  developed,  many  of 
which  are  based  on  classical  ML  or  maximum  entropy 
principles^.  Types  of  spectrum  models  and  the  estimation 
technique  developed  here  are  different  from  those  previously 
discussed.  In  particular,  they  are  applicable  to  often 
encountered  cases  when  multiple  interfering  phenomena  are 
present^"11,  an(j  wnen  classical  techniques  may  be 
inapplicable.  Also,  the  new  estimation  technique  described 
here  results  in  more  efficient  estimation  as  compared  to 
classical  ML  and  ME  methods,  even  when  applied  to  classical 
autoregressive  problems^. 

Section  2  discusses  Einstein's  interpretation  of  photon 
spectra  for  electromagnetic  waves,  introduces  a  pdf  model  for 
signal  spectra  compatible  with  Einstein's  ideas,  and  derives  the 
ML  estimators  for  the  model  parameters.  I  discuss  the 
relationships  between  the  ML  and  the  entropy  or  information 
maximization  and  address  specific  similarities  and  differences 
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between  the  new  estimation  principle  and  the  classical  ones. 
Section  3  extends  these  results  to  images  and  higher- 
dimensional  cases.  Section  4  contains  a  discussion;  also,  it 
briefly  overviews  and  provides  references  to  the  applications 
and  numerical  comparisons  that  have  been  implemented  and 
published. 


2.  EINSTEINIAN  MODEL  FOR  SPECTRUM 
ESTIMATION 

Einstein  interpreted  the  electromagnetic  spectrum  as  a 
probability  density  function  (pdf)  of  the  photon  energy^.  A 
similar  interpretation  is  valid  for  phonons  of  acoustic  spectra 
(speech,  seismic  signals,  etc.)  and  for  any  signal  field  obeying 
Bose-Einstein's  statistics  (bosons).  A  photon  energy  e  is 
related  to  its  frequency  co, 


8=  ftco, 


(1) 


therefore,  according  to  the  Einsteinian  interpretation,  spectral 
models,  F(co),  are  to  be  interpreted  as  frequency  pdfs,  or,  in 
other  words,  proportional  to  the  number  of  photons  with 
frequency  co.  This  interpretation  requires  proper 
normalization;  since  empirical  spectrum  S(co)  is  measured  in 
units  of  energy,  its  interpretation  as  a  pdf  requires 
normalization  on  a  photon  energy.  We,  therefore,  consider  the 
number  of  measured  photons 


NQ  =  S(co)/ftco,  Z  No  =  N. 

CO 


(2) 


and  F(co)//2CO  is  a  model  pdf  for  a  single  photon  with  frequency 
co,  that  is  an  expected  value  EfNu/N}  normalized  as  follows 


2  F(co)//KO  =  1,  F(co)//zco  =  E{No/N}. 

CO 


(3) 


We  consider  discrete  empirical  spectra  and  their  models,  so 
that  Nqj  is  the  number  of  photons  in  the  appropriate  interval 
of  frequencies  and  F(co)  is  their  normalized  expected  value. 
We  also  assume  that  a  small  interval  of  frequencies  around 
zero  is  excluded,  so  that  normalization  (3)  is  possible.  In 
many  applications  zero  frequency  is  excluded,  because  the 
physical  carrier  signal  frequency  is  much  larger  than  the 
frequency  variations  of  interest.  In  macroscopic  systems, 
photons  are  statistically  uncorrelated  (most  often,  this  is  also 
true  for  microscopic  systems  as  well).  Therefore,  for  an 
ensemble  of  photons  n  =  1,...,  N,  the  joint  pdf  or  likelihood  L 
is  a  product  over  individual  photons 

L  =  n  F(cOn)//icon  =  n  [F(co)//2CO]N(0  =  n  [F(co)//?co]S(CO)Mt° 


CO 


CO 


(4) 


The  second  equality  here  is  obtained  as  follows.  The  product 
over  individual  photons,  n,  is  split  into  two  terms:  a  product 


over  photons  with  a  fixed  frequency  CO  and  a  product  over 
various  frequencies  CO.  There  are  Nqj  photons  with  a  fixed 
frequency  co,  all  distributed  according  to  identical  pdf  model 
F(co),  this  leads  to  the  above  expression.  The  likelihood  (4) 
does  not  account  for  the  variability  in  data  associated  with 
differences  between  a  single  realization  of  the  empirical  data 
N(o  and  the  model  of  its  expected  value.  For  a  quantum 
system  containing  a  small  number  of  photons  this  might  be  a 
significant  deficiency,  but  for  classical  systems  containing 
large  numbers  of  photons,  expression  (4)  is  adequate. 
According  to  the  ML  principle,  the  likelihood  (4)  or  its 
logarithm  should  be  maximized  in  the  estimation  process.  If 
the  number  of  parameters,  Npar,  vary  in  the  estimation 
process,  the  proper  quantity  to  be  maximized  is  called  the 
Akaike  Information  Criterion,  AIC;  it  is  obtained  by 
subtracting  Npaj/2  from  the  log-likelihood 


AIC  =  logL  -  Npar/2 


(5) 


The  specific  shape  of  the  parametric  model  for  F(co)  can 
be  determined  based  on  the  physics  of  the  process  under 
consideration^'^,  or  a  general  type  of  flexible  parametric 
model  can  be  selected  as  in  [6,  11]  or  below.  Often,  signals 
can  be  considered  as  being  produced  by  incoherent 
contributions  from  several  sources.  In  such  a  case,  the 
spectrum  is  a  sum  of  individual  source  contributions, 


F(co)  =  X  F(co|m),  m=  1.....  M. 
m 


(6) 


Such  a  model  for  a  pdf  is  called  in  statistics  a  mixture  model, 
and  we  call  each  mixture  component  F(co|m)  a  sub-model 
corresponding  to  source  m.  Conditional  pdf  modeling 
individual  sources  in  many  cases  can  be  modeled  using 
Gaussian  densities.  Gaussian  models  are  widely  applicable, 
because  a  superposition  of  Gaussian  models  can  be  used  to 
model  any  function.  The  Gaussian  model  is  given  by 

F(co|m)  =  ftco  Am  G(co|m)  Aco,    m  =  1,...,  M; 

-1/2-1  2  2 

G(colm)  =  (2k)     (am)    exp{-0.5  (co-  com)  /om  }.  (7) 

In  the  above  equations,  Am  is  a  sub-model  weight,  Aco  is  a 
sampling  interval,  com  is  the  sub-model  mean  frequency  and 
am  is  the  sub-model  frequency  standard  deviation.  A 
multiplicative  term  /zco  in  Eq.  (7)  is  introduced  according  to 
the  normalization  requirement  (3),  so  that  [Am  G(co|m)]  is 
measured  in  units  of  a  photon  number,  and  G(co|m)  is 
interpreted  as  the  conditional  pdf  of  the  photon  frequency  from 
source  m.  A  wide  applicability  of  the  above  model  is  due  to 
the  fact,  that  Gaussian  functions  form  an  overcomplete  set  of 
functions  so  that  any  function  No  can  be  modeled  using  eqs. 
(6),  and  (7). 

The  mixture  model  specified  by  (6),  and  (7)  is 
characterized  by  three  parameters  per  Gaussian  sub-model:  the 
weight,  the  mean,  and  the  standard  deviation.    The  ML 
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estimation  equations  for  these  parameters  are  derived  by 
maximizing  the  likelihood  L  in  (4),  or  its  logarithm,  LL  =  In  L. 
Constraint  (3)  can  be  accounted  for  by  using  the  method  of 
Lagrange  multipliers,  resulting  in  the  following  ML  equations: 

Am  =  Nm  /  N,  Nm  =2  P(mlco)  [S(co)//jco], 

CO 

N  =  Z  S(co)//zco,  (8) 

CO 

com  =  Z  P(mlco)  [S(co)//jco]  to  /  Nm,  (9) 

CO 

Cm2  =  Z  P(mlco)  [S(co)//ico]  (co  -  to™)2  /  Nm.  (10) 

CO 

The  term  P(mlco), 

P(mlco)  =  F(co|m)  /  [  Z  F(co|m')  ], 
m' 

XP(mtco)=  1,  m=  1....M,  (11) 
m 

has  the  meaning  of  the  a  posteriori  Bayes  probability  that  a 
photon  at  frequency  co  has  originated  from  the  source  (or  sub- 
model) m.  Thus,  Nm  is  the  number  of  photons  from  the 
source  m,  and  N  is  the  total  number  of  photons. 

The  ML  estimation  eqs.  (8)  through  (11)  do  not  yield  an 
immediate  estimate  of  the  model  parameters,  because  the 
probabilities  P(mlco)  in  the  right-hand  side  of  eqs.  (8)  through 

(10)  depend  on  the  unknown  parameter  values,  according  to 

(11)  .  These  equations  can  be  considered  as  defining  an 
iterative  system:  beginning  with  some  values  of  parameters, 
the  sub-models  F(co|m)  are  computed  and  the  probabilities  are 
computed  according  to  (11);  on  the  next  iterations,  the 
parameter  values  are  recomputed  according  to  (8)  through 
(10),  etc.,  until  convergence.  The  convergence  is  determined 
by  requiring  that  parameter  changes  are  small  from  an  iteration 
to  iteration.  The  convergence  is  always  attained;  this  is  a 
consequence    of   the   estimation-maximization  (EM) 

algorithm^.  If  the  number  of  sub-models,  M,  should  be 
estimated  from  the  data,  this  is  accomplished  by  maximizing 
eq.(5). 

It  should  be  noted  that  the  above  Eqs.  (8)  through  (1 1)  are 
approximate.  The  approximation  is  in  considering  the 
Gaussian  densities  to  be  normalized  for  a  discrete  set  of 
frequencies, 

Z  G(colm)  Aco  =  J  G(colm)  dco  =  1 .  (12) 

CO 

A  numerical  evaluation  shows  that  for  om  >  1  (in  units  of 
sample  numbers),  the  approximation  is  accurate. 

Below,  I  discuss  two  alternative  interpretations  of  the 
above  estimation  procedure.  First,  the  estimation  is 
interpreted  as  a  relaxation  to  equilibrium  of  the  photon 
ensemble,  leading  to  a  relationship  between  the  "Einsteinian" 
likelihood  and  entropy.    And  second,  the  estimation  is 


interpreted  as  maximization  of  the  information  contained  in 
the  internal  model  of  an  estimation  system  about  the  world. 

Equilibrium  of  the  Photon  Ensemble. 

I  will  demonstrate  that  the  ML  eqs.  (8)  through  (11) 
describe  the  equilibrium  state  of  a  system  consisting  of  the 
physical  ensemble  of  photons  and  the  estimation  system, 
which  degrees  of  freedom  are  the  estimated  parameters  of  the 
model.  An  average  number  of  the  observed  photons  is 
proportional  to  a  number  of  the  photon  physical  states. 
Therefore,  according  to  Einstein's  interpretation,  a  spectrum 
sub-model  F(co)  is  interpreted  as  being  proportional  to  a 
number  of  physical  states,  for  a  single  photon  at  each 
frequency, 

F(co)  =  const  •  M)  •  O(o  .  (13) 

The  equations  of  physical  equilibrium  in  our  case  can  be 
derived  by  using  a  standard  textbook  procedure^,  which  is 
now  briefly  described  (for  the  classical  limit,  when  Nqj  /  <J>U 
«  1).  Since  our  estimation  procedure  deals  with  the  fixed 
observation  data,  the  number  and  energy  of  photons  is  fixed, 
which  should  be  accounted  for  in  the  estimation  procedure.  In 
statistical  physics,  a  system  with  the  fixed  energy  and  number 
of  particles  is  called  a  microcanonical  ensemble.  The 
equilibrium  of  a  microcanonical  ensemble  is  obtained  by 
maximizing  the  entropy  of  the  ensemble,  E,  subject  to  the 
constraints  of  conservation  of  energy  e  and  photon  number  N. 
According  to  eqs.  (1),  (2),  and  (3),  this  results  in  the  following 
constraints  on  our  model  F(co), 

e  =  Z  S(co)  =  N  Z  F(co); 
co  CO 

N  =  Z  S(co)//2CO  =  N  Z  F(co)//zco.  ( 1 4) 

CO  CO 

The  ensemble  entropy,  E,  is  a  logarithm  of  the  number  of 
states  available  to  the  system,  T.  It  is  a  product  of  the  number 
of  states  available  at  each  frequency,  ru,  and  its  computation 

is  a  standard  textbook  exercise^.  For  Nq>  photons  with 
frequency  co,  the  total  number  of  states  To  is 

Tco  =  (Oco)N(°/  (No!),  (15) 

where  the  numerator  is  the  total  number  of  all  combinations 
and  the  denominator  accounts  for  the  permutations  of  the 
equivalent  photons.  Thus,  we  obtain  for  the  entropy  of  the 
photon  ensemble, 

E  =  lnr=lnnro)  =  Z  Noln  [  Oco/No)].  (16) 

CO  CO 

If  the  number  of  the  parameters  vary  in  the  estimation  process, 
the  varying  degrees  of  freedom  associated  with  the  estimation 
system  have  to  be  accounted  for  in  the  estimation  process. 
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This  is  accomplished  by  subtracting  1/2  for  every  degree  of 
freedom,  this  is  subtracting  the  number  of  parameters  divided 
by  two,  Npaf/2  to  the  above  expression;  this  procedure  is  a 
consequence  from  the  first  principles,  which  is  well  known  in 
statistical  physics  and  in  statistical  estimation  theory  14,12 
Combining  eq.(16)  with  this  additional  item  and  with  eqs.  (2) 
and  (13), 

E  =  I  [S(M)//2CD]  In  [  F(co)  /  (const-S(co))]  -  Npar/2.  (17) 

CO 

Maximization  of  this  expression  subject  to  constraints  (14)  can 
be  obtained  by  using  the  method  of  Lagrange  multipliers.  It 
leads  to  the  same  equations  (8)  through  (11).  Thus,  the  ML 
estimation,  using  "Einsteinian  likelihood"  is  equivalent  to 
finding  an  equilibrium  of  the  estimation  system  and  photon 
ensemble. 


Information  in  the  Model  about  the  Data 

In  his  classical  work^,  Shannon  introduced  a  concept  of 
the  mutual  information  contained  in  the  received  message 
about  the  sent  message.  For  our  purpose,  this  concept  can  be 
formulated  as  follows.  We  identify  the  sent  message  with  the 
measured  data  S(co)  and  the  received  message  with  the  internal 
model,  F(co).  Then,  the  mutual  information  in  the  model  about 
the  data  is  given  by  (see  Appendix  for  details) 

1=  Z  [S(co)//zco]  In  [  F(ca)//KD  ].  (18) 

CO 

Maximization  of  the  entropy  (17),  for  the  estimation  purpose, 
is  equivalent  to  maximization  of  the  mutual  information  (18). 

The  information  or  entropy  maximization  discussed  in 
this  paper  is  connected  to  the  maximum  entropy  estimation 
principle  that  has  been  used  since  1950s.  There  are  significant 
similarities  and  differences  between  our  approach  and  other 
methods  under  similar  names,  so  it  is  useful  to  provide  a  brief 
overview  of  the  history  and  literature  concerning  the  roots  of 
the  maximum  entropy  estimation  in  statistics.  Kullback  and 
Leibler^,  following  Khinchin,  developed  a  measure  of 
information  distance,  that  was  used  to  develop  Khinchin- 
Kullback-Leibler  estimation  approaches  also  known  as 
maximum  entropy  (ME),  minimum  cross  entropy  (MCE)  and 
minimum  discrimination  information  (MDI)2>3  The  ME 
philosophy  was  formulated  by  Jaynes^  as:  "maximally 
noncommittal  with  regards  to  missing  information".  In  ME 
estimation,  the  problem  is  formulated  as  follows.  Estimate  a 
pdf,  q(n),  given  a  set  of  linear  constraints  on  q.  The  ME 
estimation  consists  in  maximizing  the  entropy  defined  as 

maxE;  E  =  -E  q(n)  In  q(n).  (19) 
n 

According  to  the  ME  philosophy,  ME  estimates  a  function 
q(n)  by  maximizing  its  randomness,  while  satisfying  the 
constraints.  When  a  prior  guess  p(n)  estimating  q(n)  is  known 


in  addition  to  the  constraints,  MCE  is  used,  which  minimizes 
cross-entropy  CE, 

min  CE;  CE  =  I  q(n)  In  [q(n)/p(n)],  (20) 
n 

subject  to  the  constraints.  MCE  minimizes  "information 
measure  necessary  to  change  p(n)  into  q(n)",  subject  to  the 
constraints.  Note  the  differences  between  eq.(20)  and  our 
eq.(17):  in  MCE  like  in  ME,  the  sum  is  weighted  with  the 
sought  function  q(n),  while  in  our  definition,  the  sum  is 
weighted  with  the  measured  numbers  of  photons.  Shore^ 
emphasized  that  the  ML  estimation  is  justified  from  the  basic 
principles  only  if  a  model  is  exactly  correct  (for  some  set  of 
parameters),  while  MCE  does  not  rely  on  this.  Thus,  he 
argued  that  MCE  is  a  more  general  approach  than  ML,  in  that 
it  is  justified  even  if  we  know  that  our  models  are 
approximate.  This  is  also  true  for  our  "Einsteinian"  mutual 
information:  the  developed  method  extracts  maximal 
information  about  the  data,  which  could  be  extracted  with  a 
given  model,  even  if  the  model  is  approximate. 

*■«  2d  1 

3.  EXTENSION  TO  2-D  AND  HIGHER 
DIMENSIONAL  IMAGES 

The  Einsteinian  interpretation  of  spectrum  does  not  have 
to  be  limited  to  the  frequency-domain  spectra,  but  is  naturally 
extended  to  intensity  or  power  densities  in  any  coordinates, 
e.g.,  two-dimensional  domains  of  time-frequency  spectra, 
regular  angle-angle  intensity  imagery,  or  higher  dimensional 
domains  such  as  time-frequency-range-angle  imagery^'  10,11 
The  extension  is  a  straightforward  one.  Let  us  denote  the 
general  D-dimensional  image  coordinates  as  x,  imagery  data, 
S(x),  and  image  models,  F(x).  The  "Einsteinian"  likelihood 
expression  and  the  ML  equations  are  similar  to  the  ones  in  the 
previous  section, 

logL  =  Z  [S(x)//j(0]  log[F(x)//zco].  (21) 
x 

If  the  mixture  model  with  D-dimensional  Gaussian 
components  (sub-models)  is  used, 

F(x)  =  E  F(x|m) ,  F(x|m)  =  ha  Am  G(x|m),     G(xlm)  = 
m 

(2r:)"D/2(detCm)"1/2  exp{-0.5  (x  -  xm)T  Cm"1  (x  -  xm)},(22) 

Again,  Am  is  a  sub-model  weight,  xm  is  the  sub-model  mean 
position  vector  and  Cm  is  the  sub-model  covariance, 
determining  the  shape  of  the  sub-model  in  D-dimensional  x- 
space.  The  term  [Am  G(co|m)]  is  measured  in  units  of  a 
photon  number,  and  G(x|m)  is  interpreted  as  the  conditional 
pdf  of  photons  from  source  m.  And,  the  ML  estimation 
equations  for  the  model  parameters  are: 

Am  =  Nm/N,  Nm=  £  P(mlx)  [S(x)/ftco],  N  =  £  [S(x)//jco]  , 
x  x 
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xm    =  Z  P(mlx)  [S(x)//j(0]  x  /  Nm, 
x 

Cm  =  Z  P(mlx)  [S(x)/ftco]  (x  -  xm)  (x  -  xm)T  /  Nm., 
x 

P(mlx)  =  F(x|m)  /  [  Z  F(x|m')  ],  (23) 
m' 

where  P(mlx)  has  the  meaning  of  the  a  posteriori  Bayes 
probability  that  a  photon  with  coordinates  x  has  originated 
from  the  source  (or  sub-model)  m.  Thus,  Nm  is  the  number 
of  photons  from  the  source  m,  and  N  is  the  total  number  of 
photons.  As  in  the  previous  section,  this  set  of  equations 
defines  a  convergent  iterative  system.  While  the  Gaussian 
mixture  model  described  above  can  model  any  intensity 
image,  usually,  it  is  practically  useful  for  images  composed  of 
relatively  few  Gaussian  "blobs".  More  complicated  models 
that  are  required  for  more  complicated  images  can  be 
developed  and  estimated  by  combining  the  new  estimation 
principle  with  a  technique  described  in  [10,  1 1,  16]. 

4.  DISCUSSION 

This  paper  introduced  a  new  estimation  principle  based  on 
the  Einsteinian  interpretation  of  the  spectrum  as  a  probability 
density  for  the  photon  frequency.  Today,  the  photon 
composition  of  light  is  well  known,  and  Einsteinian 
interpretation  of  spectrum  as  a  frequency  pdf  is  not  surprising 
and  no  different  in  principle  from  other  distributions 
considered  in  statistical  physics.  Still,  my  inspiration  for 
connecting  physical  estimation  of  state  distributions  with 
statistical  estimation  came  from  the  1910  paper  by  Einstein 
and  Hopf,  and  I  think  that  terms  "Einsteinian  likelihood"  and 
"Einsteinian  information"  are  justified. 

The  new  estimation  principle  is  applicable  to  most  regular 
intensity  images  and  signals  produced  by  classical 
(macroscopic)  imaging  systems.  We  used  classical  limit  for 
computations  of  the  numbers  of  states,  still,  the  quantum 
nature  of  the  electromagnetic  (or  acoustic)  field  determines 
statistical  properties  of  a  photon  (or  phonon)  ensemble  and  the 
estimation  procedure.  This  can  be  compared  to  the  Plank 
distribution  of  a  classical  blackbody  radiation  being 
determined  by  the  quantum  nature  of  the  electromagnetic  field. 
The  quantum  structure  of  nature  is  utilized  in  the  new 
estimation  principle  for  solving  a  particular  problem  of  the 
classical  Shannon  information  theory.  Information  is  defined 
with  respect  to  elementary  choices  among  classes,  or 
elementary  states.  In  classical  information  theory,  the 
definition  of  elementary  states  is  arbitrary.  In  the  proposed 
estimation  principle,  when  interpreted  as  maximal  mutual 
information,  the  elementary  states  are  given  by  the  quantum 
states,  photons  (or  phonons). 

This  paper  considered  positive-valued  intensity  or  power 
signals  and  images.  But,  it  seems  that  the  developed  technique 
can  be  extended  to  non-positive  and  complex- valued  signals  as 
well.  The  physical  field  carrying  the  signal  is  composed  of 
photons  or  other  particles  and  the  phase  of  the  signal  is  related 
to  the  phase  of  the  particles.  Therefore,  one  may  expect  that 


phase  density  models  can  be  developed  that  relate  the  signal 
phase  to  the  photon  (or  phonon)  density.  These  models  will  be 
suitable  for  oscillating  and  complex-valued  signals,  and  will 
correspond  to  the  estimation  principle  described  in  this  paper. 

The  "Einsteinian"  estimation  principle  described  here  has 
been  applied  to  a  number  of  diverse  applications,  on  a 
somewhat  ad  hoc  basis.  It  was  applied  to  spectrum  estimation 

of  autoregressive  signals^;  numerical  results  for  a  number  of 
cases  have  shown  that  the  new  principle  results  in  more 
efficient  estimators  than  the  classical  ML  and  ME  estimators 
in  non-asymptotic  region.  Applications  to  model  estimation 
for  two-dimensional  time-frequency  spectra  of  acoustic  signals 
also  was  described  in  [6].  Modeling  of  interfering  phenomena 
in  over-the-horizon  radar  spectra  was  described  in  [8,  9]  along 
with  estimation  of  parameters  of  specific  events  of  interest  due 
to  ionospheric  disturbances.  Development  of  concurrent 
detection,  classification  and  tracking  techniques  (so  called 
track-before-detect)  based  on  the  new  principle  was  described 
in  [10]  for  radar  and  in  [11]  for  sonar.  Superior  performance 
to  the  classical  techniques  was  indicated. 
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APPENDIX 

Consider  a  "universe"  consisting  of  the  two  interacting 
systems,  "world"  characterized  by  the  data  {S(co)}  and 
"estimation  system"  characterized  by  the  model  {F(co)}. 
Mutual  information  in  the  model  about  the  data  is  given  by  I  = 
In  (  p(S,F)/  p(S)p(F) ).  Here,  p(S,F)  is  the  probability  (or  pdf) 
of  the  data  {S(co)}  and  estimated  model  {F(co)}  (in  the 
interacting  universe),  and  p(S)  and  p(F)  are  the  probabilities 
(or  pdfs)  for  the  non-interacting  universe.  Using  p(S,F)  = 
p(SIF)p(F),  the  mutual  information  can  be  rewritten  as  I  =  In  ( 
p(SIF)/  p(S)  ),  where  p(SIF)  is  a  conditional  probability  (pdf). 
These  probabilities  can  be  computed  by  using  corresponding 
numbers  of  states,  I  =  In  (  FsiF  /Ts  ).  Here,  TsiF  is  tne 
number  of  the  world  states  compatible  with  the  knowledge  of 
the  data  {S(co)}  given  the  estimated  model  {F(co)},  and  Ts  is 
the  number  of  states  compatible  with  the  a  priori  knowledge, 
"prior  to  interaction"  between  the  two  systems.  Prior 
knowledge  depends  on  properties  of  the  model  that  are  known 
a  priori  and  do  not  change  in  the  estimation  process,  this  is 
given  by  N,  O.  By  using  computations  similar  to  those  in  the 
text,  we  obtain  InrsiF  =  £  Nq  In  [  Om  ]  and  lnl"s  =  N  ln<£> 

CO 

(in  these  expressions  we  have  omitted  the  terms  accounting  for 
permutations  of  identical  photons,  because,  as  we  have  seen, 
these  terms  do  not  depend  on  the  model  parameters  and  do  not 
affect  the  estimation  process).  Finally,  we  obtain  for  the  part 
of  mutual  information  that  depends  on  the  model  parameters,  I 
=  Z  No  ln[Oco  /O]  =  ZN(o  ln[F(co)/ftco],  this  gives  eq.(18). 
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Abstract 

An  intelligent  system  must  process  information  as  human 
beings  do.  A  framework  is  presented  wherein  dynamic  cognitive 
structures  called  chains  of  thought  are  used  to  embody  the  dynamic 
aspects  of  the  intentions  of  intelligent  agents.  These  chains  of 
thought  are  represented  as  fuzzy  cognitive  structures.  In  this 
particular  framework,  structure  mapping  plays  an  important  role  in 
the  notion  of  misconceptions.  An  expert  agent  has  to  deduce  the 
"private"  symbolic  constructs  of  a  novice  agent  in  order  to 
correctly  guide  the  novice  through  a  particular  task.  These 
"private"  constructs  are  derived  from  (partially)  corresponding 
"public"  symbolic  constructs  that  are  a  result  of  intelligent 
symbolic  communication  between  the  expert  agent  and  novice 
agent.  This  whole  process  embodies  what  the  author  refers  to  as 
cognitive  diagnosis. 

Chains  of  thought,  the  "private"  symbolic  constructs,  are  the 
main  focus  of  the  current  paper.  Its  applications  in  designing 
intelligent  systems  and  cognitive  diagnosis  are  emphasized.  The 
resulting  framework  is  also  an  important  step  towards  developing 
a  method  for  evaluating  degrees  of  intelligence  and  possible 
comparison  between  (groups  of)  intelligent  systems. 

L  Introduction 

For  an  artificial  system  to  be  considered  intelligent,  it  must 
process  information  as  human  beings  do.  The  theoretical  work 
presented  in  this  paper  has  two  major  goals.  First,  it  investigates 
and  describes  what  types  of  cognitive  structures  are  used  when 
intelligent  agents  communicate  with  each  other.  Secondly,  it 
attempts  to  capture  the  dynamics  of  these  cognitive  structures 
during  symbolic  communication  between  intelligent  agents. 

Intelligent  agents  express  their  intentions  when  they 
communicate  with  one  another.  In  order  for  this  communication  to 
take  place,  a  common  vocabulary  has  to  be  used.  These  "public" 
symbolic  constructs  make  up  speech,  be  it  spoken  or  otherwise.  In 
addition  to  this  common  vocabulary,  there  needs  to  be  a 
correspondence  between  the  syntax  of  the  language  used  by  each 
participant.  As  noted  by  Kohout  [2],  there  must  also  be  semantic 
agreement  if  the  conversation  is  to  be  successful.  Unfortunately, 
the  transition  from  "private"  intentions  to  "public"  symbolic 


constructs  is  not  always  clear.  This  is  true  not  just  to  agents 
actively  involved  in  the  conversation  (we  shall  call  these 
participants),  but  also  to  those  agents  not  really  involved  in  the 
conversation:  spectators  (correspondingly,  non-participants)  just 
observing  the  interaction. 

Dynamic  cognitive  structures  must  be  used  by  intelligent 
agents  to  deduce  and  make  decisions.  These  must  also  be  used  to 
express  themselves.  The  whole  process  of  trying  to  determine 
what  one  agent  is  thinking,  or  attempting  to  express,  based  on  a 
dialogue,  their  behavior,  their  actions,  etc.  is  what  initiated  the 
work  presented  in  this  paper. 

This  work  has  relatively  close  ties  to  the  field  of  semiotics,  a 
discipline  of  combining  the  theory  of  signs,  symbols,  and  meaning 
extraction.  In  the  next  few  sections,  we  explore  the  area  of 
intelligent  tutoring  systems  as  a  foundation  for  the  work  by  Juliano 
and  Bandler  [1].  Next,  the  dynamic  cognitive  structures  that 
Juliano  and  Bandler  call  chains  of  thought  [1  ]  are  introduced.  A 
brief  discussion  of  the  implications  of  this  work  is  presented  and 
further  work  is  also  proposed. 

IL  Tutoring  Systems  and  Deducing  Misconceptions 

Intelligent  tutoring  systems 

Intelligent  tutoring  systems  (or  ITSs)  provide,  perhaps,  the 
best  ground  work  for  investigating  the  cognitive  structures  of 
interest  in  this  study.  ITSs  are  systems  designed  to  aid  in  tutoring 
students  on  a  particular  subject  matter.  These  systems  are 
supposed  to  possess  some  form  of  intelligence  by  having 

1 .  some  form  of  model  of  the  student  ;  and 

2.  some  reasoning  capabilities  based  on  this  model 

The  most  common  approach  in  the  design  of  an  ITS  is  to  get  a 
better  understanding  of  possibly  how  we  as  humans  perform  in  a 
tutoring  role.  In  particular,  we  need  to  investigate  how  we 
represent  and  process  information  based  on  the  two  items  listed 
above.  This  entails  an  approximation  of  the  thinking  process  using 
some  knowledge  structure  based  on  a  system's  "own"  expert 
knowledge  structure.  The  process  of  "thinking  about  thinking"  is 
what  Juliano  and  Bandler  [1]  have  referred  to  as  cognitive 
diagnosis,  which  includes  any  method  of  confirming  the  relative 
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position  of  the  novice  student's  thinking  pattern  based  on  a 
corresponding  idealized  pattern. 

The  approach  we  take  is  based  on  cognitive  diagnosis  during 
problem  solving.  In  this  scenario,  the  student  is  given  several 
problems  to  solve  and  the  tutor  may  intervene  whenever 
appropriate  in  directing  the  student  to  understanding  the  concepts 
involved  It  is  during  this  problems  solving  session  that  a  model  of 
the  student's  understanding  has  to  be  formulated  and  used  in 
determining  when  to  intervene  and  how  to  control  the  direction  of 
the  session  (do  we  present  a  simpler  problem  or  do  we  go  with  one 
that  is  a  little  bit  more  challenging?). 

The  process  of  deducing  misconceptions 

Perhaps  the  most  important  factor  in  successful  tutoring  is  how 
to  correctly  deduce  student  misconceptions.  This  is  based  on 
information  gathered  during  a  tutoring  session,  most  likely 
consisting  of  partial  or  incomplete  solutions  to  the  problem  at  hand. 
This  information  may  also  contain  a  lot  of  noise.  The  student  may 
indicate  varying  degrees  of  uncertainty  (or  confidence)  to  their 
solutions.  Furthermore,  under  most  normal  situations,  the 
information  is  usually  available  in  various  forms:  verbal,  visual 
cues,  scribbled  computations,  diagrams  or  doodles,  etc.  Clearly,  all 
this  complicates  approximating  this  in  an  artificial  system. 

To  simplify  the  process,  the  following  main  assumptions  are 
made  for  an  underlying  theory  of  knowledge  states,  which  can  be 
conveniently  represented  in  a  computer: 

1 .  The  knowledge  of  the  expert  agent  and  the  knowledge  of  the 
novice  agent  overlap  one  another. 

2.  A  knowledge  state  describes  a  set  of  concepts  and  the  set  of 
relations  among  these  concepts. 

3.  When  an  expert  agent  approximates  the  knowledge  state  of  a 
novice  agent,  this  information  is  used  to  generate  "lines  of 
reasoning",  which  we  call  chains  of  thought,  to  account  for 
each  observed  action. 

The  first  assumption  is  important  because  it  implies  the  view  that 
expertise  is  decomposable  into  independent  components  that  may 
be  used  to  define  various  dimensions  of  knowledge.  It  also  implies 
that  "subsets"  inherit  essential  characteristics  of  the  "full"  model; 
hence,  the  novice  agent's  mastery  of  a  particular  component  can 
somewhat  be  deduced. 

The  second  assumption  indicates  an  associative 
representation.  Fuzzy  cognitive  maps  are  used  to  account  for  the 
inherent  vagueness  and  imprecision  in  most  natural  tutoring 
environments  where  decisions  are  based  on  imprecise,  qualitative 
data.  The  third  assumption  adds  an  important  level  of  detail  in  the 
dynamics  of  the  process  of  cognitive  diagnosis. 

This  subsethood  assumption  between  the  knowledge  of  the 
expert  agent  and  novice  agent  somewhat  parallels  Kohout's  [2] 
note  that  semantic  agreement  is  required  if  a  conversation  between 


two  agents  is  to  be  successful.  In  his  statement,  he  indicates  that 
there  must  be  some  form  of  overlap  for  this  to  proceed. 

The  process  by  which  tutoring  systems  deduce  misconceptions 
by  novice  agents  is  also  similar  to  the  way  intelligent  agents 
perceive  their  surrounding  environment.  In  the  tutoring  scenario, 
the  environment  is  replaced  by  a  particular  novice.  The  expert 
agent  has  to  adjust  to  each  novice  agent  and  base  its  actions  on  its 
current  model  of  that  agent.  Similarly,  intelligent  agents  must 
adapt  to  changes  in  their  environment. 

IDL  Approximating  Chains  of  Thought 

Fuzzy  cognitive  maps 

As  indicated  in  the  previous  section,  one  of  the  assumptions 
we  have  is  that  knowledge  states  describe  a  set  of  concepts  and  the 
set  of  relations  among  these  concepts.  The  dynamic  cognitive 
structures  we  use  to  represent  knowledge  states  are  fuzzy  cognitive 
maps  (or  FCMs).  Mathematically  [  1  ],  a  FCM  M={CM,  RM)  over 
a  finite  universe  of  discourse,  X,  is  a  fuzzy  graph  that  is  a  2-tuple 
where: 

CM  e  [0, 1]*  is  a  fuzzy  concept  space  of  X  (l) 

and 

Ru  =  (rm  .  rm  » -  .  Ru  )  >s  a  fuzzy  multirelation  (2) 

Each  R*  (where  1  <>  k  <,  rM)  is  a  fuzzy  relation  on  the  fuzzy  concept 
space  CM  For  more  details  on  the  mathematics,  operations,  etc.  on 
FCMs,  refer  to  [1]. 

FCMs  are  ideal  representations  of  knowledge  states  not  just 
because  they  capture  the  essence  of  the  first  two  assumptions  listed 
in  the  previous  section.  They  are  ideal  also  because  they  can  be 
encoded  from  the  communication  between  two  intelligent  agents. 
This  gives  us  a  means  of  possibly  representing  "private"  symbolic 
constructs  to  correspond  to  an  agent's  expertise  or  knowledge,  or 
as  an  approximation  of  what  another  agent  is  trying  to  say. 

Chains  of  thought 

Knowledge  states  alone  are  not  sufficient  in  capturing  the  three 
assumptions  we  laid  out  To  represent  lines  of  reasoning,  there  has 
to  be  a  way  to  move  or  transform  from  one  knowledge  state  to 
another.  This  requires  a  formal  representation  of  chains  of  thought. 
Mathematically  [1],  we  can  define  a  chain-of-thought  structure  on 
a  universe  ofdiscourse,Xtobea5-tuple  T  =  (C,/?,  Y,  S>,  6) 
where  the  pair  C  and  R  define  a  FCM  structure  based  on  (1 )  and 
(2),  respectively,  and 

*P  e  [0,1]*  is  a  knowledge  state  space  (3) 
$  e  [0,1]*  is  a  valid  input  space  (5) 

5:?X$-T  is  a  transition  function  (6) 

Notice  that  these  definitions  have  some  similarities  to  the  formal 
definition  of  a  finite-state  sequential  machine.  For  more  details  on 
the  mathematics,  operations,  etc.  on  chains  of  thought,  refer  to  [1]. 
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Knowledge  communication  and  cognitive  diagnosis 

What  are  the  distinct  processes  involved  in  approximating 
chains  of  thought?  We  shall  consider  two  intelligent  agents,  A  and 
B,  although  this  could  be  extended  to  more  than  two  agents. 
Without  loss  of  generality,  let  us  assume  that  A  is  the  expert  agent 
and  B  is  the  novice  agent  in  a  tutoring  environment.  Initially,  we 
start  off  with  the  arrangement  given  in  Figure  1  below. 


Figure  1  Initial  phase  of  the  interaction 

The  shaded  region  in  each  agent  denotes  domain  knowledge. 
Recall  that  in  our  first  assumption,  the  overlay  principle 
(subsethood  property)  actually  holds  between  agents. 


Agent  B's  knowledge  is  then  communicated  to  agent  A  in 
some  manner,  possibly  during  a  problem  solving  task.  This  is 
depicted  in  Figure  2. 


A  B 


Figure  2  Communication  from  the  novice  agent 

Whatever  was  expressed  by  novice  Agent  B  now  has  to  be  encoded 
by  expert  Agent  A.  This  is  done  by  Agent  A  to  either  develop  or 
update  a  model  of  what  Agent  B  is  thinking.  This  is  illustrated  in 
Figure  3. 


A  B 


Figure  3  Encoding  and  model  generation/updating 
by  the  expert  agent 

After  expert  Agent  A  has  generated  and/or  updated  its  model 
of  what  Agent  B  is  expressing  (or  thinking),  the  next  step  is  model 
interpretation.  In  [1],  Juliano  and  Bandler  propose  the  use  of  a 
discrepancy  operator  to  identify  misconceptions,  if  any,  indicated 


by  the  model  of  the  novice.  Whatever  type  of  operator  is  used,  this 
is  applied  to  the  expert's  domain  knowledge  and  the  current  model 
of  the  novice.  This  is  depicted  in  Figure  4. 


A  B 


Figure  4  Model  interpretation  by  the  expert  agent 

Whatever  interpretation  is  derived  by  expert  Agent  A,  this  has  to  be 
conveyed  back  to  novice  Agent  B  in  some  form.  This  is  illustrated 
in  Figure  5.  Hopefully  this  information  will  direct  Agent  B  to  the 
ideal  chain  of  thought. 


A  B 

Figure  5  Expressing  misconceptions  identified 
by  the  expert  agent 


Next,  notice  that  the  novice  Agent  B  now  undergoes  a  process 
similar  to  that  depicted  in  Figures  3  and  4.  Communication  from 
expert  Agent  .4  has  to  be  encoded  and  this  information  interpreted 
and  possibly  assimilated  into  Agent  B's  current  knowledge 
structure.  Thus,  this  model  has  a  fairly  general  application  for 
intelligent  agents  and  the  modeling  of  some  underlying  cognitive 
processes. 

IV.  Discussion 

The  model  for  cognitive  diagnosis  outlined  in  the  previous 
section  warrants  elaboration.  Firstly,  the  appropriateness  of  using 
FCMs  to  represent  knowledge  states  is  emphasized  by  two  factors 
in  the  model.  Knowledge  communication,  the  act  of  expressing 
one's  knowledge  (see  Figures  2  and  5),  is  primarily  fuzzy  in  nature. 
One  can  observe  how  to  express  most  everyday  reasoning  and 
common  sense  knowledge  -  these  are  conveyed  imprecisely.  When 
it  comes  to  approximating  this  in  a  machine,  this  is  where  the 
expressiveness  of  fuzzy  set  theory  steps  in.  Furthermore,  the 
transformation  of  spoken  (or  otherwise)  language  into  an  internal, 
"private"  model  as  in  Figure  3  is  inherently  noisy  and  imprecise  as 
well.  Again,  fuzzy  set  theory  can  capture  this  imprecision. 

The  use  of  fuzzy  graphs,  called  FCMs,  also  facilitates 
modeling  of  the  operation(s)  depicted  in  Figure  4.  In  [1], 
operations  for fuzzy  difference  and  fuzzy  discrepancy  between  two 
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FCMs  are  defined.  In  a  tutoring  scenario,  these  can  be  used  by  an 
expert  agent  to  approximate  any  deviations  by  the  novice  agent 
from  an  ideal  cognitive  structure.  This  ideal,  of  course,  is  based  on 
and  limited  by  the  expert  agent's  own  domain  knowledge. 

FCMs  also  facilitate  the  formal  definition  of  a  chain  of  thought 
structure.  Figure  4  was  meant  to  depict  operations  on  either 
knowledge  states  (using  FCMs)  or  operations  on  lines  of  reasoning 
(using  chain  of  thought  structures).  Again,  in  [1],  a  formal 
definition  for  chain  of  thought  homomorphisms  was  defined  that 
includes  both  structure  preserving  and  transition  preserving 
constraints. 

Perhaps  the  biggest  concern  here  is  on  the  encoding  scheme. 
This  is  depicted  in  Figure  3  as  the  transformation  from 
communicated  language  to  model  generation  or  updating.  To 
achieve  semantic  agreement,  a  limit  can  be  placed  on  the 
communications  bandwidth  through  the  design  of  the 
corresponding  user  interface  for  the  system  under  consideration. 
For  example,  in  designing  an  intelligent  tutoring  system  for 
elementary  genetics,  a  graphical  user  interface  can  be  designed  to 
include  such  tools  as  a  calculator,  icons  depicting  various  species 
to  "breed",  Punnet  squares,  etc.  The  interface  can  be  designed  so 
that  the  use  of  each  tool  corresponds  to  the  expression,  as  in  Figure 
2,  of  a  novice's  knowledge  in  FCM  (or  chain  of  thought  structure) 
form  Of  course,  this  does  not  guarantee  that  all  the  necessary  tools 
will  be  used  in  attempting  to  solve  a  problem,  but  this  is  merely 
part  of  the  imprecision  and  noise  inherent  in  the  task  at  hand. 

With  this  in  mind,  the  expression  of  misconceptions  depicted 
in  Figure  5  can  be  done  in  the  reverse  transformation.  The  correct 
use  of  the  tools  provided  by  the  interface  can  be  emphasized.  This, 
in  conjunction  with  the  elaboration  of  any  other  corrections  to 
misconceptions  (most  likely  at  the  knowledge  state  level)  can 
account  for  a  fairly  powerful  tutoring  tool. 

The  framework  presented  here  does  not  only  apply  to  the  area 
cognitive  diagnosis  in  intelligent  tutoring  systems.  The  panoply  of 
operations  and  relations  on  the  cognitive  structures  listed  in  [1], 
some  of  which  are  presented  in  this  paper,  also  provides  an 
important  step  towards  developing  systematic  methods  for 
evaluating  degrees  of  intelligence  between  intelligent  agents.  This 
will  also  be  useful  in  comparing  (groups  of)  intelligent  systems. 

V.  Summary  and  Conclusions 

What  was  presented  in  this  paper  is  a  continuation  of  the  ideas 
presented  in  [3]  with  emphasis  on  the  dynamics  of  the  chain  of 
thought  structure.  The  focus  was  on  the  idea  that  misconceptions 
can  be  symbolically  represented  by  FCMs  at  the  knowledge  state 
level,  or  by  chain  of  thought  structures  at  the  reasoning  level.  The 
area  of  cognitive  diagnosis  lends  itself  to  other  areas  similarly 
characterized  by  decision  making  in  complex  systems  with 
uncertainty.  This  uncertainty  may  be  in  the  form  of  incomplete, 
incorrect,  contradictory,  misleading,  or  even  noisy  information. 


Chains  of  thought,  used  to  represent  the  "private"  symbolic 
constructs  of  intelligent  agents,  facilitate  the  identification  and 
symbolic  representation  of  misconceptions.  This  is  an  important 
step  in  the  design  of  intelligent  systems;  in  particular,  systems 
where  cognitive  diagnosis  is  a  significant  function. 
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ABSRACT:  Cyberaesthetics  addresses  an  emerging 
integrated  multi-disciplinary  approach  to  the  development  of 
intelligent  systems:  combining  cybernetics,  semiotics  and 
aesthetics.  The  paper  describes  a  mathematical  apparatus  of 
intelligence  related  it  to  Kantian  philosophy,  which  has  a 
room  for  the  concept  of  beauty. 

1.    INTELLIGENCE  AND  AESTHETICS. 

Robots  that  can  read  and  enjoy  literary  texts  are  not  yet  at 
hand.  Notwithstanding,  there  is  a  growing  understanding 
among  engineers  and  mathematicians,  that  intelligent  systems 
of  tomorrow  will  have  to  be  able  to  perceive  the  meaning  and 
intentions  of  concepts,  scenes,  and  texts,  depending  on 
situations  and  contexts.  And,  in  order  to  achieve  this,  robots 
must  be  endowed  with  emotions  and  aesthetics.  The  term 
"robot"  we  use  broadly  as  an  intelligent  system,  an  "infobot". 
The  term  "aesthetics"  we  use  in  the  Kantian  sense  as  a  science 
of  feelings  and  feeling  perception,  a  science  of  emotions.  In 
western  cultural  tradition,  there  is  a  long-standing  opposition 
between  thinking  and  feeling.  And  often,  we  tend  to  forget 
that  this  is  a  highly  abstract,  refined,  top  level  view  of  our 
psyche.  Actual  human  processes  of  intellection  are 
complicated  interwoven  interactions,  vortices  of  multiple 
neural  processes,  involving  various  parts  of  the  brain  and 
various  conscious,  unconscious,  and  instinctual  levels.  A 
historic  view  of  rationality  as  limited  to  Aristotelian-Godelian 
logic  is  too  narrow  and  is  being  rejected  today.  A  new 
understanding  of  rationality  emerges  as  a  hierarchical  goal- 
directed  functioning,  which  involves  internal  and  external 
actions;  actions  within  the  mind  of  an  intelligent  system  and 
into  the  outside  world.  This  new  understanding  emerges 
concurrently  in  multiple  fields  dealing  with  phenomena  of 
intelligence:  in  philosophy,  cognitive  sciences,  art  and  art 
criticism,  education,  mathematics  and  engineering. 

If  you  pinch  your  finger,  it  hurts,  and  an  ability  to  feel 
the  pain  is  obviously  an  a  priori  faculty,  which  is  necessary 
for  survival  to  such  an  extent  that  it  is  shared  throughout  the 
entire  animal  kingdom.  This  "lower"  origin  of  feelings 
separates  them  from  our  higher  cognitive  abilities.  And,  there 
is  a  long-standing  line  of  thought  that  separates  and  counter- 
poses  feelings  and  thinking,  emotions  and  intellect.  Is  there 
any  ground  for  their  unification?  Neural  and  cognitive  sci- 
ences have  been  concerned  with  relating  emotions  to  material 
neural  and  bodily  physiological  functions.  For  example,  neu- 
ral pathways  have  been  found  from  hypothalamus  (brain  areas 
associated  with  emotions)  to  viscera.  The  popular  known 
connection  between  fear  and  upset  stomach  could  possibly  be 
understood  as  a  survival  mechanism  regulating  interactions  be- 
tween fear  and  hunger.  This  is  another  example  of  the  "lower" 


aspect  of  emotions.  Brain  research  relating  emotions  to  higher 
intellectual  functions  is  yet  in  the  incipient  stage.  Interactions 
between  cortical  systems  (associated  with  high  cognitive  func- 
tions) and  hypothalamus  is  hypothesized  to  be  mediated 
through  amygdala.  The  high  degree  of  reciprocal  anatomical 
connections  found  among  these  neural  structures  suggests  ex- 
istence of  information  processing  loops  involving  emotional 
and  cognitive  functions.  Our  knowledge  of  the  brain  structure 
is  insufficient  for  deducing  the  mathematical  theory  of 
"higher"  emotional  functions,  and  many  believe  that  even  if 
the  entire  wiring  diagram  of  the  brain  were  available,  it  still 
would  not  be  possible  to  deduce  its  main  mathematical  con- 
cepts. Neural  and  psychological  data  have  to  be  combined 
with  philosophical  analysis  and  physical  intuition  in  order  to 
develop  mathematical  theory  of  higher  emotions. 

In  1787  in  a  letter  to  his  friend,  Kant  wrote  that  he  dis- 
covered a  new  type  of  a  priori  principle,  the  feelings  of  plea- 
sure and  pain,  which  he  found  to  be  a  necessary  part  of  our  in- 
tellect. Kant  came  to  a  conclusion  that  our  Judgment  faculty 
is  based  on  the  feeling  of  pleasure  caused  by  the  harmony  or 
correspondence  between  our  internal  representations-concepts 
and  empirical  phenomena.  The  new  principle  governs  "intel- 
lectual emotions".  These  "higher"  emotions  are  not  separated 
from  thoughts,  but  they  are  combined  together  in  a  dynamical 
process  described  by  Kant  in  his  three  volumes  that  overturned 
our  understanding  of  the  entire  history  of  the  philosophical 
analysis  of  the  "mind-body  problem"  [1781;  1788;  1790]. 
The  present  paper  describes  mathematical  aspects  of 
Cyberaesthetics,  combining  mathematics  with  Kantian  theory 
of  mind.  The  next  Sect.  2  overviews  a  mathematical  appa- 
ratus of  the  modeling  field  theory  (MFT),  suitable  for  describ- 
ing higher  emotions  and  providing  a  foundation  for  the  math- 
ematics of  mind  in  which  a  concept  of  beauty  could  be  in- 
cluded. Sect.  3  briefly  reviews  Kantian  theory  and  establishes 
connections  with  MFT.  Sect.  4  discusses  a  possibility  for 
Emotional  Machine  and  outlines  directions  of  future  research. 

2.    MODELING  FIELD  THEORY 

2. 1  Systems  with  Adaptive  Intelligence 

Consider  (1)  a  set  of  input  data  (X(n),  n  e  N},  where  each 
member  is  a  vector  in  D-dimensional  space,  X  =  {X<j,  d  =  1,... 
D};  (2)  a  set  of  categories  {h  e  H},  which  are  characterized  by 
internal  parameters  {Sh}  and  by  models  of  the  data  {Mh(Sh, 
n)};  and  (3)  a  similarity  measure  between  the  sets  of  models 
and  data,  L({X},{M}).  A  set  of  parameters  is  finite,  Sh  = 

{Sah,  a  =  1,...  A},  but  not  necessarily  limited,  and  a  set  of 
categories  is  not  fixed,  its  cardinality  H  is  finite,  but  may  vary 
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in  the  process  of  learning.  A  similarity  measure  is  designed  so 
that  it  treats  each  model  as  an  alternative  for  each  piece  of  data 

L({X},{M})  =    EI      I    r(h)  /(X(n)  I  h),  (1) 
neN  heH 

here  /  is  a  conditional  partial  similarity  between  data  vector 
X(n)  and  model  Mh-  For  example,  /can  be  selected  as  a 
probability  density  function.  Then  L  is  a  total  likelihood  (this 
interpretation  does  not  require  statistical  independence  among 
data  vectors  n  and  n':  dependencies  can  be  accounted  for  by 
considering  X(n')  as  parameters  of  the  models  for  the  data 
vector  n). 

The  problem  consists  in  estimating  internal  parameters  S 
and  in  making  a  choice  of  the  categories  (or  classification), 
that  is,  in  obtaining  a  partition 

{N}  =  {NilN2l...|NH},  (2) 

that  decides,  which  model  h  corresponds  to  an  observation  n 
(here  Nh  is  a  subset  of  N),  by  maximizing  the  similarity 
eq.(l).  When  likelihood  is  used  as  a  similarity  measure,  this 
is  a  problem  of  the  maximum  likelihood  estimation. 

Categories  activated  by  high  similarity  values  produce 
actions,  including  generation  of  messages  transmitted  within 
the  intelligent  system,  which  acknowledge  the  category 
activation.  Messages  include  the  category  parameters.  They 
can  serve  as  input  data  for  other  categories  and  can  be  used  for 
control  of  actuators.  Correspondingly,  data  are  sensory  data  or 
internal  messages,  and  models  are  patterns  of  messages  and 
sensory  data.  The  above  mathematical  framework  is  fairly 
broad  and  encompasses  most  formulations  of  intelligent 
adaptive  systems.  It  includes  as  particular  cases,  statistical  and 
model-based  pattern  recognition,  complex  adaptive  systems, 
neural  networks,  and  systems  of  intelligent  agents.  The  above 
formulation  addresses  "higher-level"  intelligence  functions  of 
recognition  and  adaptation.  A  compete  intelligent  system 
would  include  lower-level  drives  for  survival,  reproduction  and, 
in  case  of  an  artificial  system,  a  robot  or  infobot,  drives  for 
producing  specific  tasks  it  was  designed  for. 

In  case,  when  a  set  of  observations,  N,  corresponds  to  a 
continuous  flow  of  signals,  for  example,  a  flow  of  visual 
stimuli  in  time  and  space,  it  is  convenient  instead  of  eq.(l)  to 
consider  its  continuous  version, 

L  =  expj    ln(  X    r(h)  *(X(n)  I  h)  ),  (3) 
N  heH 

where  N  is  a  continuum,  such  as  time-space.  In  this  case, 
models  describe  a  continuous  modeling  field,  conditional 
partial  similarities  can  be  compared  to  Lagrangian,  and 
maximization  of  similarity  L  can  be  compared  to 
minimization  of  action  in  the  physical  field  theory. 

2.2  Multiple  Hypothesis  Testing 

A  standard  method  of  solving  the  above  problem 
[Winston,  1984;  Segre,  1992]  consists  in  (1)  considering  all 


possible  partitions  eq.(2)  (within  certain  a  priori  established 
limits),  (2)  obtaining  an  estimate  of  the  parameters  Sh  for 
each  partition  by  maximizing  similarity, 

max{sh}[pdf({X(n),neN})l{Ni  I  N2  I ...  |  NH}]  (4) 

and  (3)  choosing  the  most  similar  partition.  Obviously,  a 
computational  complexity  of  this  algorithm  is  at  least  on  the 

order  of  the  number  of  partitions  ~  H^,  and  denoting  a 
complexity  of  maximization  of  the  eq.(4)  by  CI,  a  complexity 
of  the  MHT  algorithm  can  be  written  as 

complexity(MHT)  =  C1HN  (5) 

Newly  emergent  mathematical  methods  of  solving  the 
problem  of  intelligence  while  avoiding  the  combinatorial 
complexity  of  MHT  include  multiscale  hierarchical 
organization  and  evolutionary  computation. 

2.3.  Fuzzy  Partitions  and  Modeling  Field  Theory  (MFT) 

A  new  proposed  method  of  solving  this  problem  utilizes 
fuzzy  adaptive  logic  [Perlovsky,  1995;  1996].  Let  us  introduce 
fuzzy  class  memberships  f(hln),  associating  each  data  vector  n 
with  each  category  h: 

f(hln)  =  r(h)  *(X(n)lh)  /  X  r(h')  /(X(n)lh').  (6) 

h'eH 

Eq.(8)  looks  like  the  Bayes  formula  for  a  posteriori 
probabilities,  and  f(h  I  n)  can  be  interpreted  as  probabilities,  if 
likelihood  is  used  as  a  similarity  measure  and  when  the 
parameters  of  the  models  are  accurately  estimated.  Upon 
convergence  of  the  estimation  procedure  described  below,  f 
converge  to  the  estimated  probabilities.  Let  us  specify  an 
internal  dynamics  of  the  Modeling  Fields  (MF)  as  follows, 
H 

df(h  I  n)/dt  =  f(h  I  n)  X  { [°hh' "  f(h'  1  n)l  ' 
h'=l 

[3ln/(nlh,)/3Mh']  aMh'/aSh'  •  dSh'/dt,  (7) 

dSh/dt  =  {J    f(h  I  n)  [3ln/(nlh)/3Mh]  3M'h/3Sh,(8) 
N 

here 

5hh'  is  1  if  h=h',  0  otherwise.  (9) 

Parameter  t  is  the  time  of  the  internal  dynamics  of  the  MF 
system.  The  following  theorem  was  prooved. 

Theorem.  Equations  (7)  through  (9)  define  a  convergent 
dynamical  system  MF  with  stationary  states  defined  by 
max{Sh}L. 

It  follows  that  the  stationary  states  of  an  MF  system  give 
the  maximum  similarity  solution  of  the  model-based  pattern 
recognition  problem.  When  likelihood  is  used  a  similarity, 
the  stationary  values  of  parameters  {Sn}  are  asymptotically 
unbiased  and  efficient  estimates  of  these  parameters  [Cramer, 
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1963].  A  computational  complexity  of  the  MF  method  is 
linear  in  N, 

complexity(MF)  =  C2HN,  (10) 

where  C2  is  defined  by  the  relaxation  time  of  the  MF  system 
and  is  likely  to  be  independent  of  N  (or  depends  weekly).  In 
this  way  the  MF  system  solves  the  problem  of  the 
exponential  complexity  associated  with  the  MHT  method, 
eq.(5).  Fuzzy  class  memberships  f(h  I  n)  define  a  fuzzy 
partition  of  set  N,  which  can  be  used  to  define  a  non-fuzzy 
partition  of  the  type  of  eq.(4):  Vn,  max^  f(h  I  n)  =>  n  e  Nh- 
MFT  describes  an  intelligent  system  composed  of 
multiple  adaptive  intelligent  agents:  each  category-model  is 
"dormant"  until  activated  by  a  high  similarity  value.  Every 
piece  of  data  may  activate  several  categories,  which  "compete" 
with  each  other,  while  adapting  to  the  new  piece  of  data. 
Adaptation  and  other  actions  by  activated  categories  are  largely 
independent  from  most  other  categories,  except  for  those 
competing  with  each  other.  Most  important  actions  by 
activated  categories  are  transmission  of  messages  that  include 
category  parameters  and  sustaining  the  loop  of  adaptation  of 
their  own  parameters.  Evolutionary  computation  is  naturally 
incorporated  within  MFT:  transmitted  messages,  including 
category  parameters  and  other  aspects  of  category-models  are 
strings-objects  of  evolutionary  computations  and  are  employed 
by  modeling  field  systems  for  adaptation  of  its  structural 
components  that  could  not  be  incorporated  in  a  continuous 
fashion  and  are  not  the  subjects  of  adaptive  MFT  loops.  The 
overall  architecture  of  MFT  categories  can  combine 
heterarchical  and  hierarchical  organization. 

3.    MFT  AND  KANTIAN  THEORY  OF  MIND 

3.1.  The  Role  of  Kant's  Theory 

What  is  intelligence?  It  is  still  shrouded  in  mystery. 
A  lot  is  understood,  but  much  still  is  unknown  and,  most 
likely,  will  remain  unknown  for  a  while.  Intelligence  is 
attributed  to  natural  and  artificial  systems  and  some  of  these 
systems  are  very  simple,  while  others  are  very  complex.  At 
lower  levels,  intelligence  is  an  ability  to  sense  the 
environment  and  to  control  the  body  (machinery)  toward 
achieving  few  predetermined  goals.  At  higher  levels 
intelligence  includes  abilities  of  thinking,  including 
recognition  and  formation  of  concepts,  developing  complicated 
internal  representations  of  the  outer  world  and  self, 
understanding,  language  ability;  planning  behavior,  including 
direction  of  attention,  definition  of  goals  and  subgoals  and  the 
ways  to  achieve  them;  acting  within  itself  and  in  the  outer 
world;  an  ability  of  judgment,  including  feeling  and  emotions; 
abilities  of  intuition,  learning,  consciousness,  creativity,  and  a 
mysterious  feeling  of  freedom  of  will. 

Kant  overturned  the  understanding  of  the  relationship 
between  the  mind  and  world  by  considering  the  specific  a 
priori  contents  of  mind  that  enable  its  functioning.  He 
brought  the  philosophy  of  Pure  Spirit  close  to  the  scientific 
method.  He  developed  a  rational  explanation  of  mind  as  a 


system,  if  not  in  its  entirety,  still  in  its  most  interesting 
"higher"  intellectual  abilities.  Many  aspects  of  Kant's  theory 
were  further  developed  by  a  number  of  philosophers  and 
psychologists,  including  Shopenhouer,  Hegel,  Nietzsche, 
Freud,  Jung,  Bergson,  Jaspers.  Still,  the  original  theory  of 
Kant  remains  unsurpassed  in  its  comprehensive  treatment  of 
mind  as  a  system.  And,  mathematical  theories  of  intellect 
have  remained  far  removed  from  the  penetrating  depth  of  his 
understanding  and  are  inadequate  for  coming  even  close  to  the 
width  of  his  analysis.  This  does  not  have  to  be  so,  for  Kant's 
analysis  is  rational  and  therefore  can  perfectly  serve  as  a 
foundation  for  developing  the  mathematical  theory  of  mind.  A 
first  step  toward  rectifying  this  deficiency  of  the  mathematical 
theories  of  intellect  is  undertaken  here. 

In  three  volumes  on  Critique  of  Pure  Reason,  Critique  of 
Judgment,  and  Critique  of  Practical  Reason,  Kant  explained  a 
wide  variety  of  intellectual  experiences  based  on  three 
fundamental  abilities  or  faculties  of  mind:  Understanding, 
Judgment,  and  Reason.  Each  is  based  on  a  specific  a  priori 
principles  or  instincts  contained  in  mind:  concepts, 
correspondence  between  concepts  and  manifold  of  sensory  data, 
and  will  or  desire.  Understanding  is  a  faculty  of  concepts,  a 
source  of  general  notions.  Judgment  is  an  ability  to  see  that  a 
particular  case  comes  under  the  general  rule.  And,  Reason  is 
an  ability  to  draw  conclusions  that  is  to  generate  behavior. 
(The  most  important  type  of  behavior,  interwoven  with  higher 
intellectual  abilities  and  emotions,  Kant  considered  to  be  the 
behavior  of  learning.)  These  three  abilities  correspond  to  the 
three  main  modes  of  consciousness:  knowledge  (of  concepts), 
feeling  (of  correspondence  between  concepts  and  outer  world), 
and  desire  (to  act).  Even  so  Kant  devoted  a  separate  book  to 
each  ability,  they  should  be  combined  within  a  dynamical 
system  constantly  exercising  all  three  abilities  in  their 
interaction.  This  paper  makes  a  step  toward  developing  a 
mathematical  apparatus  of  Kantian  theory.  We  will  see  that 
MFT  contains  seeds  of  mathematical  modeling  of  the  three 
main  elements  of  intelligence  identified  by  Kant.  MFT  carries 
Kantian  analysis  further:  it  is  a  dynamical  system,  in  which 
the  three  abilities  identified  by  Kant  exist  in  the  process  of 
constant  interaction,  as  it  were  in  a  "vortex".  This  vortex 
models  learning  of  a  category  as  a  dynamical  formation  of  a 
symbol.  I  overview  some  of  the  higher  intellectual  abilities, 
along  with  attempts  at  their  rational  explanation  and 
mathematical  modeling. 

Emotions  and  perception  of  beauty  are  fundamental  to 
human  mind,  alike  in  everyday  life,  arts,  and  sciences.  Still, 
the  concept  of  beauty  is  mystifying.  The  first  step  towards 
mathematics  of  beauty  is  made  here.  It  is  founded  on  the 
relationship  between  MFT  and  Kantian  theory  of  mind. 
MFT's  will  for  learning,  according  to  Kant,  is  the  most 
important  will,  which  serves  as  a  basis  for  emotional 
intellectual  abilities,  for  beautiful  and  sublime. 

3.2  Understanding  Is  Based  on  Internal  Models 

An  internal  model  is  a  basis  of  intelligence.  Even  at  the 
lower  levels,  say,  of  a  lobster  sensing  and  grabbing  food,  with 
the  axons  of  sensing  cells  "wired"  directly  into  the  neurons 
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that  control  muscles,  we  can  talk  about  internal  model. 
Because,  the  signal  that  a  "food-sensing"  neuron  sends  to 
"muscle-neuron"  indicates  an  internal  lobster's  representation 
of  food.  There  is  no  such  thing  as  "food"  in  the  ocean,  "food" 
is  a  dynamic  process  of  interaction  between  an  object  in  the 
ocean,  sensing  neuron  (that  forms  an  internal  representation- 
signal),  grabbing  neuron,  and  other  relevant  neural  aspects  of 
the  lobster's  experience.  A  lobster's  mind  has  literally  few 
neurons,  and  if  our  final  goal  would  be  modeling  of  a  lobster's 
mind,  we  will  directly  proceed  to  studying  its  wiring  diagram 
without  such  nebulous  and  not  obviously  useful  concepts  as 
lobster's  internal  models. 

Our  aim,  however,  is  to  understand  and  model  higher 
levels  of  intelligence.  At  higher  levels,  a  complete  "wiring 
diagram"  of  a  neural  system,  even  if  available,  would  be  so 
complicated  that  it  does  not  furnish  an  understanding  of  the 
basic  principles  of  mind.  A  significant  part  of  the  brain  is 
involved  with  internal  models  (storing,  updating,  and  using 
them).  Our  ability  to  recognize  concepts,  even  simple  ones, 
such  as  objects,  is  due  to  internal  models  or  representations  of 
concepts.  Understanding,  first  of  all,  consists  of  concepts  in 
our  mind  along  with  their  interrelationships.  Higher  levels  of 
understanding,  such  as  understanding  of  meaning,  involve  a 
complicated  internal  model  composed  of  a  large  number  of 
submodels-concepts  with  multiple  interconnections  among 
them.  Possibly,  every  particular  phenomenon  of 
understanding-meaning  only  exists  within  a  limited  domain, 
within  a  certain  situation  or  with  regard  to  a  certain  goal. 
Then,  meaning  of  a  concept  is  modeled  by  including  this 
concept  within  a  set  of  situationally  relevant  other  concepts 
and  goals.  Meaning  requires  a  hierarchical  system:  the 
understanding  of  a  meaning  of  a  concept  requires  a  point  of 
view  from  the  next  levels  in  a  hierarchy,  above  the  level  of  the 
concept's  inner  model  and  its  recognition.  Thus,  the  meaning 
of  a  lower  level  concept  is  included  into  a  higher  level 
concept.  However,  relationships  among  levels  are  not  rigidly 
fixed:  formation  of  certain  concepts  involves  multiple 
hierarchical  levels,  and  relative  position  of  concepts  in  the 
hierarchical  levels  might  be  situationally  dependent.  Thus, 
heterarchical  hierarchy  might  be  a  better  term.  Explanation  of 
mind  as  based  on  a  priori  inner  models  ascends  to  Plato  and 
Aristotle.  Kant  identified  a  priori  inner  models  as  a  separate 
faculty  of  mind  that  he  called  Understanding.  Mind's 
operations  with  a  priori  concepts  Kant  calls  the  domain  of 
Pure  Reason. 

The  main  question  that  the  analysis  of  Pure  Reason  shall 
answer,  according  to  Kant,  is  "How  are  synthetic  judgments  a 
priori  possible?"  This  ability  Kant  identified  as  a  specific  a 
priori  faculty  of  pure  reason.  In  our  theory  of  mind,  this 
specific  faculty  is  represented  in  hierarchical  models:  next 
levels  in  a  hierarchy  contain  synthesis  of  the  lower  level 
concepts.  This  synthesis  is  of  the  a  priori  nature,  because  the 
hierarchical  structure  of  the  internal  model  is  of  the  a  priori 
origin.  Thus,  development  of  hierarchical  models  is  a  key  to 
mathematical  modeling  of  the  Understanding  and  Pure  Reason. 
Making  this  hierarchy  adaptable  and  situationally  dependent  is 
an  additional  challenge. 


Explanation  and  modeling  of  the  phenomena  of  meaning 
and  understanding  requires  also  including  them  within  behavior 
generation  and  acting  of  an  intelligent  system.  The  acting 
could  be  inside,  within  an  intelligent  system,  or  outside  of  the 
intelligent  system,  into  the  outer  world.  Actions, 
corresponding  to  the  goal  or  situation  (internal  or  external), 
constitutes  a  part  of  meaning.  There  is  also  another  aspect  of 
acting  out  in  the  external  world  noted  by  Freeman,  who 
introduced  a  concept  of  external  representations  in  the  world. 
Our  external  acts  and  their  results  (being  perceived  by 
ourselves  and  others)  from  gestures  and  utterances  to  our  entire 
culture,  as  it  exists  in  the  outside  world,  are  external 
representations.  To  the  extent  that  external  representations  are 
included  into  the  Kantian  cycle  of  category  formation,  they  can 
be  viewed  as  parts  or  extensions  of  our  internal  models. 
Computer  simulations  are  a  perfect  example  of  such 
extensions  of  internal  models. 

According  to  Kant,  Logic  gives  laws  of  understanding,  or 
laws  of  relationships  among  a  priori  categories.  Here,  in  the 
world  of  Ideas,  there  is  a  significant  domain  of  applicability  of 
the  Aristotelian  logic.  For  example,  an  internal  model 
category  of  an  object  is  either  that  of  target  or  not,  according 
to  the  Aristotelian  logic  law  of  excluded  third  (and  this  logic  is 
different  from  fuzzy  logic  of  judging  which  real  signal  belongs 
to  which  category  ).  This  domain  of  the  Aristotelian  logic 
encompasses  non-adaptive  aspects  of  the  a  priori  models.  To 
the  extent  that  the  a  priori  models  can  adapt,  they  are  fluid, 
non- Aristotelian,  fuzzy.  Development  of  adaptive  hierarchies 
of  models  is  a  challenge  for  future  research. 

To  summarize  the  MFT  relationship  to  Kant's 
Understanding:  MFT  ability  for  Understanding  or  forming 
concepts  is  due  to  a  priori  internal  models,  and  an  ability  for 
synthetic  judgments  a  priori  is  due  to  an  a  priori  hierarchy  of 
models  and  relationships  among  them. 

3.3  Judgment  Is  Based  on  Similarity  Measures 

For  internal  models  or  concepts  to  be  useful,  there  should 
be  a  way  of  relating  them  to  experience.  In  other  words,  we 
should  be  able  to  recognize  individual  phenomena  according  to 
general  concepts  and  to  decide  which  empirical  facts  correspond 
to  which  concepts.  Kant  called  this  ability  Judgment  and 
considered  it  as  one  of  the  three  main  abilities  of  mind. 
Judgment  is  an  ability  to  see  that  a  particular  case  comes  under 
the  general  rule.  In  MFT  theory,  Judgment  is  mathematically 
represented  by  similarity  measures.  MFT  contains  a  measure 
of  similarity  between  the  internal  model  and  the  world,  as  well 
as  between  each  submodel-concept-agent  and  a  particular 
subset  of  input  data.  This  is  an  a  priori  property  of  MFT  and, 
according  to  Kantian  analysis  it  is  an  a  priori  property  of  our 
mind. 

Kant  differentiates  determinant  and  reflective  aspects  of 
Judgment.  Finding  particular  subsets  of  sensory  data 
corresponding  to  a  specified  concept-submodel  is  the 
determinant  Judgment.  And,  finding  the  concept 
corresponding  to  the  data  is  the  reflective  Judgment.  MFT 
contains  mechanisms  of  both,  determinant  and  reflective 
Judgment.    Within  the  iterative  loop  of  MFT  adaptation, 
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determinant  Judgment  is  given  by  the  association 
(segmentation)  of  data  with  submodels,  and  reflective 
Judgment  is  given  by  selecting  the  concept-submodel  most 
similar  to  the  data  segment  (subset).  In  MFT  both  are  given 
by  the  fuzzy  associations,  f(kln),  with  n  designating  data  and  k 
designating  the  concept-submodels. 

Why  is  the  Nature  in  its  manifold  knowable  to  our  mind? 
Is  it  due  to  a  specific  property  of  Nature,  or  due  to  a  specific 
property  of  mind?  In  other  words,  what  makes  it  possible  for 
our  Understanding  and  Judgment  to  function  in  the  way 
described  above?  Kant's  answer  is  that  this  possibility  is  due 
to  a  special  a  priori  property  of  our  mind.  This  property  is  the 
purposiveness  of  our  internal  representations  (models). 
Understanding  and  Judgment  are  so  constructed  that  internal 
representations  of  empirical  events  and  objects  appear  to  us  as 
purposive  (the  purpose  includes  first,  a  correspondence 
between  our  internal  representations  and  the  world,  and  second, 
an  ability  to  learn  or  to  improve  this  correspondence).  This 
purposiveness  provides  a  foundation  for  the  development  of 
higher  faculties  of  mind  including  higher  emotions,  and  the 
notions  of  beautiful  and  sublime. 

A  reader  might  wonder  if  this  discussion  is  too 
philosophical  and  irrelevant  to  a  mathematical  theory  of  mind? 
The  relevancy  of  this  discussion  is  in  that  it  guides  us  in 
constructing  internal  models,  measures  of  similarity,  and  in 
developing  evolutionary  theories  explaining  these  abilities. 
The  models  and  similarities  are  constructed  so  that  they  have  a 
purpose  or  meaning  within  the  intelligent  system,  which  is 
the  mathematical  description  of  the  intentionality  of  the 
intellect.  This  intentionality  includes  the  correspondence  to 
the  world  and  adaptivity  that  provides  for  learning.  And,  it  is 
needed  so  that  the  "lower  level"  instincts  for  survival,  for 
performing  specific  tasks,  etc.  can  be  more  efficiently  satisfied 
(by  a  living  being  or  a  robot).  Intentionality  provides  a 
background  for  a  mathematical  theory  of  higher  faculties  of 
mind,  including  a  possibility  for  mathematical  treatment  of 
beautiful  and  sublime.  And,  an  evolutionary  theory  have  to 
lead  to  these  abilities. 

To  summarize  the  MFT  relationship  to  Kant's  Judgment: 
MFT  ability  for  Judgment  is  due  to  similarity  measures  and 
fuzzy  category  memberships,  which  select  data  corresponding 
to  the  categories  of  Understanding  and  select  categories 
corresponding  to  the  data,  in  iterative  cycles  of  every  MFT 
loops-agents. 

3.4  Reason  Is  Based  on  Similarity  Maximization 

Judgment  mediates  between  concepts  of  Understanding  and 
concepts  of  Reason  (will,  and  freedom).  In  particular, 
reflective  Judgment  ascends  from  particular  to  universal,  from 
sensory  data  to  concepts.  Its  principle  is  an  ability  to  learn, 
which  Kant  called  the  purposiveness  of  intellect  towards  the 
object.  This  ascendance  from  data  to  concepts  is  practically 
realized  by  Reason.  In  MFT  functioning,  finding  a  submodel 
corresponding  to  a  piece  of  data  (Judgment)  is  followed  by 
adaptive  modification  of  the  model,  which  is  the  act  of  will 
according  to  the  learning  principle  (law)  of  Reason.  Reason 
provides  laws  for  behavior,  and  MFT  paradigms  considered  in 


previous  chapters  were  concerned  with  one  type  of  behavior: 
learning  behavior  as  adaptation  of  the  internal  model. 
Modification  of  models  in  MFT  is  governed  by  the  principle 
of  maximum  similarity  between  the  model  and  data.  The 
MFT  parameter-adaptation  equations  maximizing  the 
similarity  give  the  laws  of  Reason.  Thus,  MFT  provides  for 
a  mathematical  description  of  a  will  for  learning,  a  will  for 
improvement  of  its  internal  representations  of  the  world  and 
the  laws  of  Reason  governing  this  will. 

Kant  emphasized  a  fundamental  nature  of  the  antinomy 
between  causality  and  freedom  and  severely  criticized 
philosophers,  who  underestimated  the  difficulty  of  the 
causality-freedom  antinomy.  And,  his  criticism  still  applies 
to  a  researcher,  who  is  too  cavalier  about  resolving  this 
antinomy.  The  fundamental  source  of  difficulty  is  in  that 
freedom  is  an  opposite  of  randomness.  Freedom  supposes 
causality.  If  there  is  no  causality,  there  could  be  no  freedom. 
But  if  the  world's  laws  are  causal,  how  the  freedom  could  be 
explained?  Kant  made  a  step  towards  resolving  this  antinomy. 
He  assigned  the  concept  of  causality  to  Understanding,  where 
causality  is  an  a  priori  concept  of  understanding  the  nature,  the 
world  of  phenomena.  And,  he  assigned  the  concept  of  freedom 
to  Reason,  where  it  is  an  a  priori  concept  governing  human 
desire  and  will.  According  to  Kant,  freedom  belongs  to  a 
noumenal  world,  it  originates  from  the  unknowable  nature  of  a 
human-in-itself.  A  next  step  toward  resolving  this  antinomy 
should  be  attempted  by  identifying  the  unknowable  human-in- 
itself  with  our  unconscious  and  developing  a  physical  theory 
of  conscious  and  unconscious  aspects  of  mind. 

4.    EMOTIONAL  MACHINE  (TOWARD 
MATHEMATICS  OF  BEAUTY) 

In  1787  Kant  discovered  a  new  type  of  a  priori  principle, 
which  led  to  his  formulation  of  Judgment  faculty  that  unified 
his  philosophy.  Kant  came  to  a  conclusion  that  Judgment  is 
based  on  the  feeling  of  pleasure  caused  by  the  harmony  or  cor- 
respondence between  our  internal  representations-concepts  and 
empirical  phenomena.  The  new  principle  governs  "intellectual 
emotions",  which  are  not  separated  from  thoughts,  but  are 
combined  together  in  a  dynamical  process  of  Kant-MFT  loop- 
agents.  Mathematical  apparatus  describing  Judgment  in  MFT 
are  given  by  similarity  measures.  And,  a  thought  process  is  a 
loop,  a  vortex  of  concepts,  emotions,  and  adaptation  actions. 

Higher  emotions  are  related  to  an  ability  for  the  percep- 
tion of  beauty,  which  is  a  universal  and  fundamental  property 
of  human  mind.  It  is  important  not  only  in  the  field  of  fine 
art,  but  it  pervades  human  experience.  And,  there  are  well 
known  statements  by  famous  scientists,  explaining  that  the 
first  and  foremost  test  of  a  scientific  theory  is  its  beauty.  But, 
mathematical  attempts  to  model  mind,  so  far,  have  not 
touched  the  subject  of  beauty,  and  the  directions  along  which 
this  could  be  attempted  seem  to  be  hidden  in  mystery  and  not 
accessible  to  scientific  investigations.  Here,  I  attempt  a  first 
step  in  this  direction.  In  this,  I  have  found  that  I  am  greatly 
added  most  of  all  by  Kant,  who  with  scrupulous  detailness  an- 
alyzed rational  mechanisms  of  beauty  and  other  higher  emo- 
tional faculties. 
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When  designing  an  intelligent  system,  say  a  robot,  we  de- 
cide what  kinds  of  objects  the  robot  should  be  able  to  recog- 
nize and  we  supply  the  robot  with  the  internal  models  of  these 
objects.  From  robot's  perspective,  only  those  types  of  objects 
exist  which  it  can  recognize.  Every  object  has  a  purpose  of 
being  recognized  (in  addition  to  any  other  purpose,  the  robot 
may  put  this  object  for).  A  universal  purpose  of  any  object  is 
its  concept:  for  an  object  to  have  any  purpose  for  a  particular 
intelligent  system,  the  object's  concept  has  to  exist  in  the  sys- 
tem. This  is  a  design  principle  of  any  intelligent  system. 
And,  this  design  principle  is  applicable  to  us:  the  evolution 
(or  God)  designed  us  so  that  we  can  find  our  ways  around  those 
objects  which  we  recognize  in  the  nature.  The  basic  principle 
of  the  design  is  that  the  nature  appears  to  us  as  purposive.  The 
purposiveness  of  nature  is  the  a  priori  part  of  our  representa- 
tions and  it  harmonizes  nature  with  our  desire  for  knowledge 
and  produces  the  feeling  of  pleasure  (or  pain,  if  chaos  is  en- 
countered). 

Knowledge  about  objects  comes  from  experience  and  from 
the  a  priori  concepts  (Understanding).  The  role  of  Judgment  in 
this  process  was  discussed  in  the  previous  section:  it  is  an  ob- 
jective, or  cognitive  aspect  of  Judgment.  In  this  section,  we 
concentrate  on  the  subjective  aspect  of  Judgment,  that  relates 
to  the  subject  and  not  to  the  object.  This  subjective  aspect  is 
the  satisfaction  and  feeling  of  pleasure  that  is  bound  with  the 
harmony  between  our  internal  representations  and  an  object. 
Kant  calls  this  aesthetical  aspect  of  Judgment,  because  it  re- 
lates more  to  emotions  than  to  cognition  (even  so  all  aspects 
are  combined  in  every  act  of  perception  and  cognition).  This 
aesthetical  aspect  of  Judgment  is  related  to  the  "pure"  purpo- 
siveness of  our  representations,  which  is  separate  from  any 
specific  purpose  that  an  object  can  be  used  for,  and  includes 
only  the  knowledge  itself.  Thus,  android-robots  capable  of 
self-learning  have  to  be  designed  so  that  they  have  an  aestheti- 
cal affinity  to  knowledge.  In  MFT  this  is  given  by  the  simi- 
larity, l(XIM),  that  relates  a  particular  case  X  to  the  general 
concept  M,  without  any  further  specific  purpose  the  object  can 
be  used  for. 

To  the  extent  that  the  purposiveness  is  felt  in  its  pure 
form  and  is  bound  to  its  a  priori  nature,  the  object  is  called 
beautiful.  The  nature  of  beauty  is  related  to  an  interest  not  in 
the  object,  but  in  the  subject:  what  I  make  out  of  this 
representation  in  myself.  Beautiful  is  what  coincides  with  the 
purpose  of  acquiring  more  knowledge  and  improving  the 
harmony  between  the  internal  representations  and  Nature. 
Kant  discusses  two  higher  intellectual  aesthetical  abilities: 
feelings  of  beautiful  and  sublime.  Beautiful  involves  the 
relationship  between  Judgment  and  Understanding  and  Sublime 
involves  the  relationship  between  Judgment  and  Reason.  The 
feeling  of  Sublime  moves  the  Reason  to  act  toward 
improvements  of  internal  representations.  MFT  provides  a 
foundation  for  the  mathematical  description  of  these  abilities: 
similarity  performs  both  of  these  functions,  it  establishes 
relationships  among  data  and  models  (concepts  of 
Understanding),  and  it  activates  actions  of  adaptation  toward 
improving  the  harmony  between  the  models  and  nature. 


Is  it  possible  to  write  an  equation  of  beautiful?  And 
should  it  be  the  goal  of  our  study?  Could  there  be  an  equation 
that  tells  the  difference  between  Rembrandt,  Warhol,  and  a 
causal  recreational  artist?  Of  course  not.  The  purpose  here  is 
to  demonstrate  that  there  is  a  possibility  for  a  mathematical 
theory  of  mind,  in  which  concepts  of  emotion  and  beauty  have 
place.  But  then,  why  not  proceed  directly  to  producing  by 
means  of  MFT  even  a  simplest  example  of  something 
beautiful?  Where  is  the  difficulty  precluding  us?  The 
difficulty  is  in  an  adaptive  nature  of  beauty.  Kant  got  himself 
in  trouble  with  later  readers  and  admirers,  by  his  attempts  to 
provide  examples  of  what  is  beautiful  and  what  is  not.  His 
examples  (such  as  e.g.,  that  drawings  could  attain  pure  beauty 
and  paintings  do  not)  were  immediately  criticized  and  he  was 
branded  as  having  undeveloped  aesthetical  taste  and  worse. 
One  of  the  reasons  is  that  what  was  beautiful  thousands  years 
ago  is  not  necessarily  beautiful  today.  Concepts  of 
Understanding  evolve,  and  those  concepts  that  were  useful 
sometimes  ago,  in  that  they  captured  important  aspects  of 
nature  and  provided  an  evolutional  advantage  to  those  who 
possess  them,  are  not  necessarily  useful  any  longer,  they  lost 
their  purposfulness.  Within  our  evolving  internal  models, 
some  concepts  may  become  commonplace,  outdated,  empty  of 
useful  contents,  and  contradictory  to  newer,  better  adapted 
concepts.  Since  mathematically,  beauty  is  related  to  the 
harmony  between  the  internal  model  and  nature,  it  is  changing 
with  time.  What  is  an  excellent  harmony  between  an  adaptive 
model  and  data  in  an  engineering  system,  such  as  considered  in 
[Perlovsky,  1994;  Perlovsky  et  al,  1997],  is  a  very  simple 
construct,  unworthy  of  the  word  "beauty"  in  the  context  of  our 
mind.  In  order  to  design  android-robots  capable  of  the  human- 
level  perception  of  beauty,  even  at  a  rudimentary  level,  their 
internal  models  have  to  be  much  more  complicated  than 
currently  used  in  engineering  applications.  But  this  quickly 
changes.  And,  possibly,  we  will  see  complicated  robots, 
which  would  have  to  learn  for  many  years,  like  human  do, 
acquire  individual  subjective  experience,  and  their  perception  of 
beauty  will  acquire  human-like,  individual  features. 
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Abstract.  Computer  engineering  proposes  the 
construction  of  complex  systems  by  dynamic 
prototyping  (Buddie  and  Bacon,  1992).  But  this 
prototyping  cannot  be  inductive  and  purely  considered  as 
a  trial  an  error  process.  To  be  successful,  one  must 
possess  an  underlying  hypothetical  model  (Marr,  1982) 
of  what  are  the  functions  of  the  system.  If  these 
functions  relates  to  physical  tasks,  such  as  sensing 
temperature,  manipulatiing  objects,  etc.,  the  desired 
behavior  can  be  observed,  and  a  model  can  be  built. 
Conversely,  if  the  functions  of  the  system  are  to  be 
applied  to  semio-informational  tasks,  such  as  language 
translation,  information  retrieval,  hypertext  navigation, 
text  generation,  etc.,  the  interpretative  behavior  is  not 
readily  observable.  Now,  as  any  other  computer 
systems,  these  systems  are  symbol  manipulation 
machines  (Newell  ,1980).  They  must  also  manipulate 
input  and  outputs,  but,  in  themselves,  these  data  are 
semiotic  objects,  and  not  physical  ones.  These  systems 
manipulate  objects  that  have  to  be  interpreted  by  some 
cognitive  agent.  In  other  words,  systems  that 
manipulate  physical  objects  require  a  model  of  the 
physical  word,  while  systems  that  manipulate 
informational  objects  require  a  semiotic  model.  In  this 
paper,  we  illustrate  how  a  semiotic  model  can  help  in 
the  conception,  the  modeling,  and  the  experimentation 
of  a  semiotic  behavior  such  as  Computer  Assisted 
Reading  and  Analysis  of  Text  (CARAT),  and  how  this 
model  has  called  upon  the  Genetic  Algorithm  (GA) 
theory  to  realize  some  of  its  aspects. 

I.  Presentation  of  CARAT 

1.1  General  presentation 

Computer  Assisted  Reading  and  Analysis  of  Text  is 
the  computer  technology  that  offers  readers  an 
asssistance  in  attaining  some  aspects  of  the 
informational  or  semiotic  content  of  a  text  (discursive, 
lexical,  hypertextual,  thematic,  stylistic,  etc.).  So, 
CARAT  definitely  relates  to  interpretative  actions. 
There  is  in  no  way  a  robot  that  reads  or  understand  a  text 
by  itself. 

One  the  classical  models  of  text  interpretation  is  the 
philological  one1.  Through  the  centuries,  thousands  of 
readers,  exegetes,  and  interpreters  have  practiced  this 
method.  Because  of  the  quality  of  its  principles,  it  has 
acquired  compelling  recognition,  and  the  weight  of  its 


1  A  certain  number  of  researchers  using  information 
technologies  are  beginning  to  place  themselves  in  this 
philological  perspective  (cf.  Thrane  et  ai,  1992). 


experience.  The  basic  principle  of  philological 
perspective  is  that  one  can  construct  relatively 
systematic  procedures  capable  to  ensure  rigor  in  text 
interpretation.  As  a  matter  of  fact,  philology  is  an 
instanciation  of  an  interpretative  semiotic  process 
applied  to  the  processing  of  textual  signs.  It  takes  sets 
of  signs  (a  text)  as  its  input,  then  classifies,  categorizes 
them,  explores  and  selects  them,  and  produces  a  new  set 
of  signs  -  the  commentaries  -  as  its  output.  This 
interpretation  process  can  be  translated  functionally  in 
terms  of  (a)  inscription,  (b)  classification,  (c) 
exploration,  and  (d)  configuration,  of  information 
(Seffah  and  Meunier,  1994).  In  its  principles,  three 
important  dimensions  can  be  emphasized  :  text  reading 
and  analysis  is  a  systematic,  dynamic  and  plastic 
behavior.  Systematicity  pertains  to  the  controlled 
processing  of  information;  dynamicity  concerns  the 
interaction  of  the  analyst  with  the  text;  and  plasticity 
allows  the  constant  renewed  interpretation  of  the  text. 

In  order  to  respect  this  particular  type  of 
interpretation  process,  a  computer  model  must  rely  on 
an  open  architecture.  It  must  allow  an  information 
processing  flow  that  is  systematic  dynamic,  and  plastic. 
Each  processing  will  be  built  out  of  interactive  advances 
and  restarts  which  sometimes  are  autonomous, 
sometimes  are  interrelated,  but  which  all  aim  at 
assisting  the  reader  and  analyst  in  penetrating  the 
content  of  the  information.  Hence,  again  a  CARAT 
system  is  not  a  robot  reader,  but  a  faithful  assistant  in 
reading  and  analyzing  texts.  In  this  perspective  CARAT 
is  defined  as  the  set  of  serial  or  parallel  operations 
which,  with  the  assistance  of  the  computer,  construct 
interpretative  paths  in  which  each  moment  produces  a 
new  textual  object  to  be  classified,  explored  and 
configured. 

1.2  CARAT  and  classification 

There  exists  an  infinity  of  possible  CARAT 
processing  flow.  Each  text,  for  each  person,  can  be  read 
and  analysed  in  so  many  ways.  One  can,  for  instance, 
inquire  on  a  particular  theme,  paraphrase  and  summarize 
a  specific  segment,  study  the  lexicon,  evaluate  the  style, 
retrieve  information,  build  a  thesaurus  or  an  index  of  the 
content,  and  so  forth.  Among  all  the  operations  at  work 
in  each  of  these  processing,  we  will  study  more 
particularly  the  classification  process.  This  process  is 
important  since  interpretation  always  requires  some  type 
of  classification  of  the  incoming  signs,  symbols  or 
information. 

In  the  field  of  text  processing,  there  exists  many 
strategies  of  classification.  Some  classical  strategies  :  a) 
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are  of  logico-symbolic  type  (eg  (Hobbs,  1993;  Sowa, 
1991)  or  semantico-linguistic  (Rastier,  1987,  Bertrand- 
Gastaldy  et  ai,  1987,  1993).  Others  are  statistical 
(Church  and  Hanks,  1990;  Reinhert,  1994;  Lebart  and 
Salem,  1994;  Pustejovsky,  1991;  Wilks,  1996;  Salton, 
1989;  etc.).  Albeit  very  systematic  these  approaches 
lack  dynamicity  and  plasticity.  Learning  is  limited,  and 
they  are  very  weak  on  processing  a  constantly  ever- 
changing  informational  input,  as  it  is  often  the  case 
with  textual  data  (for  instance  on  the  World  Wide  Web). 
Finally,  some  can  be  referred  to  as  "emergent 
computation"  models  (Forrest,  1991).  They  include 
Markovian  fields  (Kindermann  and  Snell,  1980; 
Bouchaffra  and  Meunier,  1995),  connectionnism 
(Rumelhart,  1986;  Salton  and  Buckley,  1994),  and 
Genetic  Algorithms  as  shown  in  this  article.  Besides 
their  properties  of  statistical  strength  and  generalization, 
they  are  systematic  like  any  other  clustering  strategies, 
dynamic  and  plastic  (learning  is  possible). 

Our  purpose  here,  is  mainly  to  show  how  the 
Genetic  Algorithm  approach  to  classification  can  be 
applied  to  the  problem  of  semiotic  interpretation  of  text, 
and  most  of  all  in  the  context  of  CARAT  technology. 
Although  validation  and  experimentation  of  the  GA 
approach  is  not  the  main  purpose  of  this  paper,  which  is 
modeling,  some  initial  experiments  will  be  reported. 

II.  Genetic  Algorithms 

II. 1  General  presentation 

The  GA  approach  takes  its  inspiration  from  research 
done  on  adaptive  systems.  This  research  sees  such  a  type 
of  system  as  an  agent  that  applies  to  a  domain  (called 
the  environment)  specific  operations  which  allow  him 
to  act  upon  it  in  the  most  efficient  manner  .  This 
principle  is  of  course  based  on  the  assumption  that  an 
adaptive  system  is  able  to  detect,  or  extract,  from  its 
heteroclite  domain  any  regularities  which  concern  it,  and 
vis-a-vis  which  it  must  construct  a  plan  of  adaptation. 

"The  adaptive  plan  determines  just  what  structures 
arise  in  response  to  the  environment,  and  the  set  of 
structures  attainable  by  applying  all  possible  operator 
sequences  marks  out  the  limits  of  the  adaptive  plan's 
domain.  "(Holland  1992:4) 

In  other  words,  a  strategy  of  adaptation  is  the  best 
plan  of  action  that  a  system  could  put  into  place  in  order 
to  identify  the  structures  of  its  environment.  In  a  more 
traditional  sense,  it  uses  some  type  of  pattern 
recognition  strategy  in  order  to  adapt. When  applied  to 
the  field  of  genetic  reproduction  of  species,  this  strategy 
of  adaptation  consists  in  finding,  for  a  given 
environment,  groups  of  individuals  chromosomically 
best  adapted.  When  constructed  in  a  formal  model,  this 
strategy  translates  into  an  algorithmic  model  called 
genetic  algorithm.  The  notion  of  genetic  algorithm, 
presented  for  the  first  timeby  John  H.  Holland  in  1975 
(Holland,  1975;  Holland,  1992),  was  considerably 
developed  during  the  1980's  and  1990's  (Goldberg,  1989; 
Rawlins,  1991;  Varela  and  Bourgine,  1992; 
Michalewicz,  1994). 

The  main  function  of  this  algorithm  is  the 
production  of  a  population  of  individuals,  out  of  an 
original  population,  best  adapted  to  an  environment 
which  represents  the  constraints  and  particularities  of  the 
problem  dealt  with.  The  degree  of  adaptation  is  evaluated 


by  means  of  a  fitness  function  /.  So,  the  GA  is  based 
on: 

-  an  incoding  of  information,  situations,  problems 
and  solutions,  in  the  form  of  strings  of  building  blocks, 
each  string  being  able  to  be  broken  between  each  block, 
in  the  exact  image  of  chromosomes  which  constitute 
veritable  lists  of  characteristics  of  an  individual.  This 
incoding  usually  takes  the  form  of  a  highly  structured 
binary  string,  of  a  fixed  or  variable  length  according  to 
the  type  of  problem; 

-  the  capacity  to  reproduce  such  strings  in  large 
number,  which  metaphorically  relates  to  the  sexual 
reproduction; 

-  the  existence  of  a  faculty  of  adaptation  (simulated 
by  the  function  /)  which  permits  the  evaluation  of  the 
quality  of  each  individual  created  by  the  algorithm. 

II. 2  Basic  cycle  of  a  GA 

In  practice,  a  population  P°  of  potential  solutions 
(the  chromosomes)  to  the  problem  to  be  treated  is 
generated  at  the  initialization  step.  Then  the  following 
standard  cycle,  also  called  genetic  search,  is  reapplied  : 

INITIALIZATION 

If  (stop  test  not  verified)  then 

begin 

EVALUATION  ;  SELECTION  ;  REPRODUCTION  ; 
REPLACEMENT 
end 

At  any  given  moment  t,  the  population  is: 

Pl  =  {  a[ ,  .  .  .  ,  aj,  },  where  a)  stands  for  the  candidate 

solution  a;  at  cycle  number  t.  The  main  steps  can  be 
briefly  described  as  follows  : 

1)  EVALUATION:  The  elements  of  P1  are  rank 
ordered  from  the  most  to  the  least  fitted  according  to  the 
selection  probability  Probs  : 

P 

Probg  (a})=/(a|)/(£/(aj)) 

2)  SELECTION:  The  elements  which  best  satisfy 
the  constraints  or  characteristics  of  the  solution  sought 
are  selected  according  to  the  selection  probability,  and 
arranged  by  couples  in  order  to  prepare  the  next  step. 

3)  REPRODUCTION:  genetic  operators  are  then 
applied  to  this  population  of  elites,  called  parents,  in 
order  to  obtain  an  intermediary  population  P'1.  These 
operators  permits  the  creation  of  new  strings,  among 
which  some  should  have  better  fitness  properties  than 
their  parents.  Two  genetic  operators  are  generally 
employed:  crossing-over,  which  allows  the  production 
of  two  new  elements  from  two  parent  elements  (Fig.  1), 
and  mutation,  which  allows  the  creation  of  new 
solutions  that  would  have  been  impossible  to  obtain  by 
simple  crossing.  Mutation  consists  of  a  random 
selection  of  one  of  the  bits  of  the  chromosome,  and  to 
change  its  value  with  a  pre-defined  probability 
(probability  of  mutation)  (Fig.  2). 

4)  REPLACEMENT:  This  is  the  generation  of  a 
new  population  by  replacing  the  worst  elements  of  the 
previous  population  P'  by  the  best  of  P'1.  This  new 
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population  possesses  at  least  as  many  and  sometimes 
more  of  the  characteristics  of  the  solution  than  the 
preceding  generation. 


10  1  10001  01  10  1000 

00101  10100  0  0 

01  1010001  1  101001  j 

01  1000101  101 

Before  cross-over 


10  1  1000101  10  1000 

iOI  1000101  101 

01101000  11101001  | 

00101 10100  0  0 

After  Cross  over 
Fig.  1.  Diagram  of  crossing-over 


1000  1  1  1 0 1 0 1 1 0 1  1 000  10  110  1 


0<-L>l       p  =  0,0001 
Fig  2.  Mutation 

This  cycle  is  iterated  a  lot  of  times  until  a  generation 
of  optimal  solutions  is  obtained,  and  from  which  only 
the  bests  will  be  retained.  In  certain  contexts  it  is  the 
entire  population  which  stands  in  lieu  of  the  solution. 

III.  Genetic  Algorithms  and  CARAT 

III.  1  The  object  of  the  model 

As  previouly  said,  CARAT  can  fan  out  in  many 
different  processing  flows.  In  this  section,  we  propose  a 
modeling,  by  means  of  a  genetic  algorithm,  of  the 
classification  moment  of  two  specific  processing  flows 
:  the  first  flow  aims  at  giving  the  reader  hints  on  the 
semantics  contexts  of  particular  words  ;  the  second  flow 
aims  at  automatically  suggesting  hypertext  links 
among  segments  of  texts.  So,  the  main  concepts  that 
define  GAs  will  be  translated  in  the  terms  of  classifying 
segments  of  texts,  and  this  theoretic  expose  will  be 
followed  by  a  presentation  of  some  experimental 
results. 

In  the  CARAT  context,  genetic  algorithms  consists 
of  finding,  amongst  different  segments  of  a  text,  which 
one  offer  some  regular  structures  or  form  classes  of 
regularities.  The  GA  is  seen  as  a  process  of 
classificatory  treatment  which  identifies  segments  of 
text  containing  some  identical  "type"  of  information. 
These  segments  are  most  often  pages,  and  contains 
unifs  (for  units  of  information),  which  are  simple  or 
compound  words,  lexemes,  etc.  The  determination  of 
unifs  and  segments  can  be  done  by  means  of  specialized 
computer  text  analyzing  programs  such  as  SATO, 
BOOKMANAGER,  SPIRIT,  OPEN  TEXTE, 
NATUREL,  etc.  Here  one  creates  a  lexicon,  a  linguistic 
markup,  a  tagging,  etc. 

So,  the  original  text  is  transformed  into  an  set  of 
segments  containing  only  a  balanced  and  controlled 
choice  of  units  of  information.  A  procedure  identifies 
the  presence  or  absence  of  each  unif  in  each  segment, 
and  builds  the  following  matrix  (Tab.  1)  of  n  segments 
by  m  unifs. 


unif  1 

unif  2 

unif  m 

seg.  1 

Pre(l,l) 

Pre(l,2) 

Pre(l.m) 

seg.  2 

Pre(2,l) 

Pre(2,2) 

Pre(2,m) 

seg.  n 

Pre(n.l) 

Pre(n,2) 

Pre(n,m) 

Table  1.  Matrix  segments-unifs 


One  particular  segment  is  represented  by  a  line  vector  of 
binary  numbers  given  by  the  predicate  Pre(i,  s)  :  0  for 
absence,  1  for  presence. 

111. 2  Genetic  classification 

The  object  of  the  model  is  to  assign  each  segment 
to  a  specific  and  unique  class.  The  assignment  of  a 
segment  to  a  class  is  called  classing,  and  uses  a 
classifier,  whereas  the  process  of  research  of  the  best 
classing  (i.e.,  the  best  classifier)  should  be  called 
induction  of  classification. 

111. 3  Set  definition  and  initial  population 

In  order  to  introduce,  in  the  context  of  CARAT,  the 
concept  of  population  of  individuals  (Fig.  3),  and 
particularly  the  one  of  initial  population,  we  must 
appeal  to  three  important  sets  : 

a)  the  set  T  corresponds  to  the  set  of  segments  of 
the  text.  Let  T  =  {S\,  .  .  .  ,  Sn} 

b)  the  set  K  of  classes.  Let  K  =  {C\  CndC) 

Where  NbC  represents  the  number  of  classes.  A 
class  is  a  set  of  segments  which  are  not  too  distant  one 
from  another  according  to  the  function  of  adaptation 
defined  below.  At  the  initialisation  setp,  the  number  of 
classes  is  arbitrarily  chosen  large  (it  is  equal  to  the 
number  of  segments  n)  in  such  a  way  as  to  give  the 
process  the  freedom  to  construct  as  many  classes  of 
segments  as  possible.  It  is  the  purpose  of  the  function 
of  adaptation  to  reduce,  during  the  genetic  search,  this 
number  to  an  optimal  value.  The  number  of  cycles  is 
also  arbitrarily  fixed  at  a  fairly  large  value,  in  the  order 
of  one  thousand  cycles.  Finally,  the  interpretation  of 
each  class  is  devolved  upon  the  user. 

c)  The  set  (or  population)  P  of  the  individuals  (Pl 
represents  the  state  of  the  population  at  time  t.  It 
contains  a  fixed  number  p  of  individuals).  The  elements 
of  P  are  called  classifier-vectors,  and  represents  the 
candidate  classifiers,  i.e.,  the  potential  solutions  to  the 
problem  of  the  best  classing.  In  other  words,  an 
individual  encodes  a  tentative  solution  for  classing 
segments.  The  size  of  the  classifier-vectors  is  n  ;  the 
position  number  i  corresponds  to  the  i-th  segment  in 
the  text,  and  contains  an  integer  equal  to  the  number  of 
the  class  to  which  the  segment  belongs. 

At  the  outset,  the  classifier-vectors  are  randomly 
built  to  produce  the  initial  population  P^. 


Si 

s2 

T 

1 

T 

Cn 

Cl2 

Cln 

I 


Cp, 

S2 

*-pn 

Fig.  3-  relations  of  sets  T,  K  and  P. 

The  genetic  search  has  the  task  of  carrying  out  a 
considerable  number  of  modifications  to  the  classifier- 
vectors,  such  as  recombinations  and  mutations,  in  an 
attempt  to  find  the  best  one  according  to  the  function  of 
adaptation. 
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The  j-th  individual  of  P  corresponds  to  the  vector  : 

Vj  =  (Cji  Cjn),Cjie  K 

An  individual  or  chromosome  could,  as  well,  be 
seen  as  a  set  of  couples  which  associates  each  segment 
to  a  unique  class.  The  individual  represents  also  a 
function  which  plunges  the  set  T  of  segments  into  the 
set  K  of  classes. 

Example  : 

Let  T  =  {Si,  S2,  S3,  S4  }  ;  K  =  {Ci,  C2,  C3} 

Example  of  classifier- vector  :  V  =  (2,  1,  3,  2) 

Interpretation:  the  first  segment  belongs  to  the  class  C2, 
the  second  to  the  class  Cj,  the  third  to  the  class  C3,  and 
the  last  one  to  the  class  C2. 

III. 4  The  function  of  adaptation 

I  1 1.4.1  Finality 

The  function  of  adaptation  must  evaluate  the 
intrinsic  value  of  an  individual,  and  hence  the  quality  of 
the  classing  that  it  codes.  This  function  is  defined  over 
the  set  of  individuals  and  gives  a  real  value. 

In  an  ideal  classifier  a,  segments  are  grouped  into 
compact  classes.  This  quality  is  characterized  by  the  fact 
that  displacing  a  segment  i  from  one  class  to  another 
(i.e.,  changing  the  class  number  located  at  position  i  in 
the  classifier-vector)  could  result  only  in  a  decreased 
value  of  f(a).  The  individuals  selected  for  reproduction 
are  those  which  posses  the  best  values  by  /.  Therefore, 
this  function  directs  the  entire  process  of  the  genetic 
search,  and  the  quality  of  the  whole  GA  model  depends 
essentielly  on  it. 

III. 4. 2  Choice  of  a  criterium  of  similarity 

It  behoves  the  GA  designer  to  conceive  the  most 
efficient  function  of  adaptation  .  In  particular,  this 
function  must  be  most  discriminating.  This 
discriminating  feature  involves  two  complimentary 
aspects  : 

-  the  evaluation  of  internal  cohesion  of  classes,  i.e., 
the  degree  of  similarity  of  segments  within  each  class; 

-  the  evaluation  of  the  differentiation  between 
classes  of  the  classifier- vector,  i.e.,  the  degree  of 
contrast  existing  between  classes. 

The  function  of  adaptation  we  propose  is  based  upon 
the  score  of  Jaccard  as  criterium  of  similarity.  This 
score  uses  only  the  property  of  presence  or  absence  of 
unifs  in  segments,  and  constitutes  a  common  measure 
for  evaluating  the  similarity  of  textual  documents  in  the 
case  of  information  research.  It  has  been  used  notably 
for  indexation  (Gordon,  1988).  However,  there  do  exist 
a  few  other  criteria  (Salton,  1989). 

The  Jaccard  score  of  a  couple  of  segments  (Xj,  Xfc), 
notated  Sim  (Xj,  X^),  is  equal  to  the  proportion  of  unifs 
common  to  both  segments  (notation  :  IXj  n  Xkl) 
relatively  to  the  total  number  of  unifs  present  in  the  two 

XjnXk 

segments:  Sim  (Xj,  Xk)  = 


XjUXk 

where  :  IAI  represents  the  cardinal  of  the  set  A. 


-  the  internal  cohesion  of  a  class  is  evaluated  by  a 
coefficient  of  internal  cohesion  noted  as  IC(Cj).  The 
coefficient  is  the  balanced  sum  of  the  similarity  of 
segments  taken  two  by  two  in  this  class.  It  is  defined 
by: 

1 

lC(Ci)=— —  x      X  Sim(X.,Xk) 


N(i) 


CI(Ci)  = 


N(i) 


Xj,XkeCi 

x  I 
Xj,XK6Ci 


Xj  n  Xk 


XjUXk 


N(i)  is  the  number  of  combinations  of  the  segments 
of  the  class  Cj  taken  two  by  two. 

-  the  differentiation  of  the  classsing  is  evaluated  by 
the  coefficient  of  external  dissimilarity,  noted  as 
ED(Cj),  and  computed  for  a  class  Cj  in  relation  to  all 
other  classes.  It  is  defined  as  follows: 


ED(Ci)  =  1  - 


1 


NC(i) 


XSim(X,,Xk) 

Xj6C,,XkeCCj 


=  1  —  x  X 

NC(i)  XjeC^XkeCq 


XjnXk 


XjUXk 


CCj  is  the  complimentary  set  of  Cj.  This  set 
contains  all  of  the  segments  that  do  not  belong  to  Cj. 
NC(i)  is  the  number  of  couples  (Xj,  Xk),  Xj  belonging 
to  Cj>  and  Xk  belonging  to  CQ. 

Finally,  the  function  of  adaptation  /  is  equal  to  the 
sum  of  these  two  coefficients. 

/(individual)  =     Y     (IC(Cj)  +  EC(Q)) 
ie[l,NbC] 

All  the  inter-segment  similarities  are  computed  at 
the  initialisation  step,  before  the  GA  search. 

To  summerise.  The  genetic  search  runs  a  number  of 
cycles  equal  to  the  maximum  number  of  cycles 
determined  at  the  start-up  of  the  GA.  When  the  cyclic 
processing  has  finished,  the  resulting  population 
represent  the  best  classifiers  that  the  GA  could  produce. 
The  last  step  is  to  select  from  this  population  the 
classifier- vector  which  gives  the  greatest  value  for  /.  At 
the  end  of  the  algorithm,  a  certain  number  of  classes  are 
empty.  This  is  both  expected  and  hoped  since  the 
number  of  classes  was  arbitrarily  fixed  at  a  high 
number.  Thanks  to  the  function  of  adaptation,  the 
algorithm  converges  towards  an  optimum  number  of 
non-empty  classes. 

III. 5  Experimentation  results 

The  experiment  was  carried  out  on  a  textual  sample 
drawn  from  Spirale,  a  Belgian  review  on  Education 
Sciences.  The  GA  was  developed  using  Matlab,  and  was 
integrated  into  a  software  platform,  Aladin  (Seffah  and 
Meunier,  1995),  developed  at  LANCI  for  the  CARAT 
approach. 

The  probability  of  crossing-over  was  fixed  to  0.8, 
and  the  one  of  mutation  to  0.05.  The  number  of 
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individuals  in  the  initial  population  was  100,  and  the 
number  of  generations  (or  iterations)  was  300. 

The  text  was  partitioned  uniformly  into  54 
segments  of  50  words  each,  the  end  of  a  segment  being 
determined  as  follows:  fifty  words  are  counted  from  the 
beginning  of  the  segment  and  then  any  words  remaining 
up  to  the  next  point  are  added  to  the  initial  fifty  words 
to  constitute  the  whole  segment.  We  had  at  our 
disposition  a  lexicon  composed  of  1701  words  of  which 
the  number  was  restricted  to  1360  roots  after  a 
preliminary  process  of  lemmatisation.  So,  the  size  of 
the  text  matrix  was  (1360  x  54). 

The  genetic  search  decreased  the  number  of  classes 
from  54  to  24.  Here  is  a  short  sample  of  interpretable 
results.  For  instance,  Class  4  contains  the  following 
segments  8  and  21,  in  whicn  underlined  italic  words  are 
common  unifs  determined  par  the  GA  : 

Segment  8  :  «  At  last,  Joelle  Delatte  is  grappling 
with  the  problem  of  books  for  blind  and  sight-impaired 
children,  which  would  seem  to  be  the  preoccupation  of 
at  least  some  editors  who  have  recently  proposed 
specially  designed  albums  for  them.  This  production  is 
characterized  by  a  certain  diversity  if  however  unified  by 
the  prudence  of  their  approach  :  Children's  literature 
does  not  represent  all  the.  reading  nor  all  the  literature. 
but  it  does  exist  with  a  sufficiently  rich  past  and 
present.  »^. 

Segment  21  :  «  To  the  teacher  convinced  of  the 
importance  of  reading  literature  there  exists  a  question  of 
choice  of  texts  and  how  to  transmit  them.  There  exists 
at  this  level  no  ministerial  propositions  nor  lists  of  lists 
of  books  as  there  are  for  colleges;  nor  a  specialized 
university  teaching  tradition  to  define  the  methods;  the 
initial  and  ongoing  training  is  incongruous  and  left 
largely  to  one's.own  initiative.  In  effect,  the  teacher  who 
chooses  his  own  texts  and  his  own  course  of  action, 
wittingly  or  not,  is  putting  into  action  his  personal 
conception  of  the  culture  and  the  role  of  the  school  in 
the  education  of  the  child.  »3. 

Within  these  two  segments,  three  unifs  have  been 
included  in  the  same  class  :  child,  literature,  reading. 
This  result  might  facilitate  the  work  of  a  user  facing  the 


1  «  Enfin,  Joelle  Delattre  aborde  le  probleme  des  livres 
pour  enfants  aveugles  et  malvoyants,  dont  certains  6diteurs 
semblent  d'ailleurs  se  preoccuper  en  proposant  maintenant 
des  albums  sp6cialement  concus  pour  eux.  comme  on  le 
voit,  une  certaine  diversity  caracterise  cette  livraison,  mais 
son  unite  nous  semble  r6sider  dans  la  prudence  des 
approches  :  La  literature  de  jeunesse  ne  repr^sente  ni  toute 
la  lecture  ni  toute  la  literature,  mais  elle  existe,  avec  un 
present  et  un  pass6  suffisamment  riches.  » 
~i 

«  Se  pose  alors,  a  l'enseignant  convaincu  de  ['importance 
de  la  lecture  litteraire.  la  question  du  choix  des  textes  et  des 
modalit6s  de  transmission.  En  effet,  il  n'y  a  a  ce  niveau  ni 
propositions  minist6rielles  de  listes  d'ouvrages,  comme  le 
college  ;  ni  tradition  d'enseignement  universitaire 
specialise'  pour  d6finir  de  m6thodes  ;  la  formation  initiale  et 
continue  est  disparate,  largement  laiss6e  a  l'initiative  de 
chacun.  En  fait,  l'instituteur,  qui  choisit  ses  textes  et  ses 
demarches,  sciemment  ou  non,  met  en  jeu  toute  sa 
conception  personnelle  de  la  culture  et  du  role  de  l'6cole 
dans  la  formation  de  l'enfant.  » 


problem  of  reading  and  analysing  a  large  text,  by 
suggesting  a  precise  relation  between  the  segments.  A 
more  in-depth  analysis  of  the  results  would  of  course 
require  the  help  of  a  terminologist  or  a  specialist  of  the 
field.  Such  help  is  just  as  indispensable  at  the  moment 
of  preparation  of  the  text  matrix  as  it  is  at  the  end  for 
the  analysis  of  the  results. 

So,  the  genetic  algorithm  has  built  classes  that 
classifies  segments  of  text  that  offer  some  lexical 
similarity.  It  has  done  this  in  a  systematic,  dynamic  and 
plastic  manner.  From  these  classes  of  segments  the 
processing  flow  can  then  whether  choose  a  particular 
word  and  see  the  class  of  similar  segments  in  which  it 
operates  (its  particular  semantic  contexts)  or  choose  to 
built  a  hyperlink  between  two  similar  segments. 

Conclusion 

The  use  of  the  genetic  algorithm  that  we  have 
applied  to  the  analysis  of  a  text  is  still  at  the 
experimental  observation  stage.  But  it  is,  however  a 
very  promising  territory  and  very  flexible,  which 
combines  coding,  the  processing  of  data,  computation 
of  probabilities,  artificial  intelligence,  and  the  genetic 
mode  of  evolution.  The  variants  of  the  model  presented 
are  numerous  and  are  linked  to  the  diversity  and  richness 
of  the  genetic  operators,  to  the  multiple  ways  of  coding 
the  solution,  and  to  the  conception  of  the  function  of 
adaptation. 

Research  perspectives  are  situated  around  the  most 
in-depth  study  of  the  genetic  algorithm  applied  to 
CARAT  and  to  the  comparison  of  results  obtained  with 
other  classifying  systems  that  are  presently  being 
developed,  such  as  simulated  anealing  and  the  ART 
neural  network. 

We  believe  that  the  modeling  of  CARAT  by  genetic 
algorithms  allows  us  to  foresee  solutions  to  a  certain 
number  of  problems  of  processing  textual  information 
that  require  classification  tasks.  On  the  one  hand, 
classification  allows  regrouping  the  segments  of  a  text 
in  terms  of  an  optimization  of  the  similarity  level  for 
relating  segments  of  information.  On  the  other  hand,  by 
the  nature  of  its  mathematical  structure  of  the 
topological  type,  the  GA  permits  the  processing  of  a 
body  of  text  that  is  in  constant  evolution. 

This  algorithmic  strategy  is  applicable  to  diverse 
types  of  textual  information  processing  systems,  such 
as  terminological  classification,  thematic  extraction  in 
text.  Automatic  generation  of  hypertextuel  relations. 
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Abstract 

Semiotic  Cognitive  Information  Processing  Systems  (SCIPS)  are  inspired  by  information  systems 
theory.  In  a  rather  sharp  departure  from  CL  and  AI  approaches,  Computational  Semiotics  (CS) 
modelling  neither  presupposes  rule-based  or  symbolic  formats  for  linguistic  knowledge  representa- 
tions, nor  does  it  subscribe  to  the  notion  of  world  knowledge  as  some  static  structures  that  may 
be  abstracted  from  and  represented  independently  of  the  way  they  are  processed.  Consequently, 
knowledge  structures  and  the  processes  operating  on  them  are  to  be  modelled  procedurally  and 
implemented  as  algorithms.  They  determine  SCIP  systems  as  a  collection  of  cognitive  informa- 
tion processing  devices  whose  semiotic  character  consists  in  a  multi-level  representational  system 
of  (working)  structures  emerging  from  and  being  modified  by  such  processing.  Corresponding  to 
these  levels  of  emerging  structures  are  different  degrees  of  resolution  [1]  that  account  for  varying 
levels  of  representational  granularity  [8]. 

The  emergence  of  semantic  structure  as  a  self-organizing  process  is  studied  oil  the  basis  of  word 
usage  regularities  in  natural  language  discourse,  whose  linearly  agglomerative  (or  syntagmatic) 
and  whose  selectivelyinterchangeable  (or  paradigmatic)  constraints  are  exploited  by  text  analysing 
algorithms.  These  accept  natural  language  discourse  as  input  and  produce — via  levels  of  interme- 
diate representationand  processing — a  vector  space  structure  as  output  which  may  be  interpreted 
as  an  internal  (endo)  representation  of  the  SCIP  system's  states  of  adaptation  to  the  external  (exo) 
structures  of  its  environment  as  mediated  by  the  discourse  processed.  The  degree  of  correspon- 
dence between  these  two  is  determined  by  the  granularity  that  the  texts  provide  in  depicting  the 
exo-view,  and  the  resolution  that  the  SCIP  system  is  able  to  acquire  as  its  endo-view  of  it  in  the 
course  of  processing  the  discourse. 

The  SCIP  system's  architecture  is  a  two-level  consecutive  mapping  of  distributed  representations  of 
systems  of  (fuzzy)  linguistic  entities.  Being  derived  from  usage  regularities  as  observed  in  texts, these 
representations  provide  for  the  aspect  driven  generation  of  formal  dependencies  and  their  interre- 
lations in  a  format  of  structured  stereotypes.  Corresponding  algorithms  select  and  represent  fuzzy 
subsets  (word  meanings)  as  dispositional  hierarchies  that  render  only  those  relations  accessible  to 
perspective  processing  which  can — under  differing  aspects  differently — be  considered  relevant.  Such 
dynamic  dispositional  dependency  structures  (DDS)  have  proved  to  be  an  operational  prerequisite 
to  and  a  promising  candidate  for  the  simulation  of  content-driven  (analogically-associative)  instead 
of  formal  (logically-deductive)  inferences  in  semantic  processing  ([2],  [3],  [4]).  Considered  as  states 
which  the  SCIP  system  can  enter,  certain  properties  of  these  structures  can  be  identified  as  results 
of  symbolic  functions  which  were  shown  to  correspond  to  basal  referencial  predicates  ([5], [6],  [7]). 
Thus,  the  dynamics  of  semiotic  knowledge  structures  and  the  processes  operating  on  them  es- 
sentially consist  in  their  recursively  applied  mappings  of  multilevel  representations  resulting  in  a 
multiresolutional  granularity  of  fuzzy  word  meanings  which  emerge  from  and  are  modified  by  such 
textprocessing.  Test  results  from  experimental  settings  (in  semantically  different  discourse  environ- 
ments) are  produced  to  illustrate  the  SCIP  system's  granular  language  understanding  and  meaning 
acquisition  capacity  without  any  initial  explicit  syntactic  and  semantic  knowledge. 
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ABSTRACT:  Semiotic  analysis  of  text  and 
aesthetic  literary  criticism  are  brought  closer  together 
by  considering  a  human  reader  as  an  intelligent 
symbolic  system,  between  the  kingdoms  of  animals 
and  pure  spirit.  A  science  integrating  aesthetics, 
cybernetics  and  semiotics  we  call  cyberaesthetics.  It 
is  a  philosophical  and  mathematical  theory  of 
intelligence  related  to  Kantian  philosophical 
aesthetics  and  the  dynamical  concept  of  symbol. 
Phenomenology  of  literary  text  and  its  perception  is 
used  to  compare  concepts  of  cyberaesthetics  and 
zoosemiotics. 

In  his  book,  "Sign  and  Its  Masters",  Thomas 
Sebeok  tells  the  following  story.  Once  Sigmund 
Freud  entered  an  auditorium  with  a  huge  cigar  in  his 
mouth.  Students  questioned  the  respected  professor 
if  this  cigar  didn't  remind  too  much  of  a  phallic 
symbol.  Freud  responded:  "Sometimes  a  cigar  is 
just  a  cigar".  No  doubt,  students  laughed,  but  it  is 
quite  possible  that  someone  was  pondering  a 
question:  "Just  sometimes?"  Not  in  vain,  we 
remembered  this  joke. 

Lev  Tolstoy  wrote  a  story,  "After  a  Ball".  The 
essence  of  this  story  is  as  follows.  A  young  man 
was  dancing  at  a  ball  in  the  home  of  his  beloved  one, 
whom  he  was  about  to  propose.  The  ball  was 
gorgeous.  The  young  man  especially  remembered 
one  moment  when  his  beloved  danced  with  her 
farther,  a  regiment  commander,  a  tall  and  powerful 
man,  who  looked  like  the  emperor,  Nicholas  the 
First.  After  the  ball,  the  young  man  could  not  sleep, 
he  was  excited  by  just  experienced  happiness,  and  in 
the  morning,  walikng  around  the  town,  coming  by 
the  plaza,  he  saw  that  there  was  going  on  a 
punishment  of  a  soldier:  he  was  bitten  by  .  An 
officer  commanding  the  procedure  was  the  very 
same  colonel  that  the  young  man  had  admired  just 
few  hours  ago.  Here,  at  the  plaza,  the  very  same 
man  was  cold  and  unforgiving.  And,  bumping  at  his 
future  son-in-law  and  looking  straight  into  his  face, 
the  colonel  did  not  even  recognize  him,  even  so  he 
was  smiling  at  him  and  so  gently  at  the  ball,  just 


hours  ago.  After  this  terrible  scene,  the  young  man 
could  not  even  think  about  marrying  the  colonel's 
daughter,  so  terrified  he  was  by  the  man. 

So,  what  is  this  story  about?  Possibly,  that  even 
when  blinded  by  love,  a  human  being  can  not  loss 
the  compassion  to  another  human  being.  The 
violence  is  deeply  repugnant  to  us;  even  to  the  extent 
that  it  can  overpower  the  love.  Where  the  violence 
reins,  happiness  is  impossible,  and  it  is  the 
happiness  that  we  are  looking  for  in  love.  It  is 
important  to  remember  that  the  story  was  written  by 
a  writer,  who  was  at  the  time  an  active  proponent  of 
the  idea  that  evil  should  be  opposed  without 
violence.  The  young  man  did  not  become  a 
revolutionary,  did  not  attack  the  colonel,  but  could 
not  become  his  son-in-law. 

Sigmund  Freud  was  not  surprised  at  his 
student's  question,  because  he  taught  them  to  ask 
such  questions.  As  far  as  Lev  Tolstoy  is  concerned, 
he  possibly  would  be  much  surprised  if  he  learned 
about  one  contemporary  American  Slavic  scholar's 
idea.  According  to  this  scientist,  the  young  man  in 
the  story  did  not  want  marry  the  tall  and  powerful 
girl,  because  unconsciously,  her  body  frightened 
him.  The  young  man  was  searching  for  a  reason  to 
avoid  the  marriage.  Cruelly  bitten,  blooded  body  of 
the  soldier,  looking  like  a  piece  of  meet,  got 
associated  in  young  man's  mind  with  vagina,  which 
he  despised  a  priori.  According  to  the  author  of  this 
conception,  at  the  moment  of  creating  this  story, 
Tolstoy  was  writing  folk  tales,  was  enmeshed  into 
Russian  folklore,  with  its  popular  motif  of  "vagina 
with  iron  teeth".  Also  a  popular  image  is  one  of 
Kastchey  Undead,  an  old  voluptious  evildoer.  The 
conclusion:  colonel  dancing  with  the  daughter  at  the 
ball  got  associated  in  young  man's  mind  with 
Kastchey  Undead  in  a  ritual  dance  with  a  future  prey 
of  his  violence. 

Here  is  not  a  place  to  argue  with  this 
interpretation  or  its  legitimacy.  It  transforms  a  writer 
Tolstoy  into  a  folktale  teller,  destroys  the  Christian 
pathos  of  the  story  and  replaced  it  with  an  openly 
paganistic  canvas.  Even  so,  the  very  existence  of 


333 


this  interpretation  is  sufficient  to  explore  the  criteria 
for  a  valid  interpretation.  Which  version  is  true?  The 
one  that  tells  about  confrontation  of  good  and  evil, 
love  and  violence,  or  the  other  that  tells  about  the 
disgust  to  the  woman  body,  despite  the  sublime 
passion?  If  both  interpretations  are  true,  than,  purely 
artistically,  either  they  are  incompatible,  or  they 
provide  a  foundation  for  another  interpretation.  The 
physiological  is  telling  about  itself  through  the 
spiritual,  making  fun  of  it.  And  making  fun  of  the 
writer,  who  think  that  he  inspires  the  reader  by 
socio-humanistic  pathos.  Similar  to  student's 
interpretation  being  catastrophic  to  Freud,  and  to  all 
smokers  of  cigars  along  with  him,  the  interpretation 
of  Professor  Zolkovsky  is  catastrophic  to  the  story, 
or  more  accurately  to  the  way  that  Tolstoy  saw  it. 

Tolstoy  intended  to  struck  a  catastrophic  blow  to 
the  idea  of  government  violence.  The  developed 
contemporary  culture  revealed  the  internal 
catastrophisity  of  this  very  intent.  New,  searching 
readers  pointed  out  to  Tolstoy  the  true  nature  of  his 
humanistic  pathos:  his  disgust  to  woman  Tolstoy 
veiled  with  a  socially  and  ethically  noble  fable. 

What  is  of  interest  to  us  in  this  situation?  That  it 
evolves  in  the  aesthetic  space,  in  a  non-spatial 
interpretive  field  of  the  text.  Relationships  among  its 
elements  are  not  temporal,  because  they  are  already 
accomplished,  they  are  prior  to  our  analysis. 
Apriority  is  timelessness.  Cybernetics  of  aesthetical 
is  the  nonspatial  and  atemporaneous  relationship  of 
elements,  which  are  being  defined  by  this  very 
relationship.  This  is  a  definition  of  a  literary,  or  in 
general,  artistic  image.  An  image  is  always  a 
manifold,  that  is,  it  consists  of  other  images,  it  is  a 
virtual  manifold  of  relationships.  What  is  a 
mechanism  of  inter-relationship  between  an  image 
and  text?  How  is  it  possible  to  relate  a  text,  which  is 
a  spatio-temporal  object  to  an  image  which  is  an 
aesthetical  space  object?  Text,  in  principle,  contains 
no  catastrophe  that  is  a  property  of  the  aesthetic 
space. 

The  first  theory  of  mind  that  combined  its  higher 
intellectual  abilities  with  emotions  and  developed  a 
rational  theory  of  aesthetics,  including  the  concept  of 
beauty,  belongs  to  Kant  (1781;  1788;  1790).  His 
theory  is  based  on  three  fundamental  abilities  or 
faculties  of  mind:  Understanding,  Judgment,  and 
Reason.  They  are  related  to  specific  a  priori 
principles  or  instincts  contained  in  the  mind: 
concepts,  correspondence  between  concepts  and 
manifold  of  matter,  and  will  or  desire. 
Understanding  is  a  faculty  of  concepts,  a  source  of 
general  notions.  Judgment  is  an  ability  to  see  that  a 


particular  subset  of  the  manifold  comes  under  the 
general  rule.  And,  Reason  is  an  ability  to  draw 
conclusions  that  is  to  act.  The  most  important  type 
of  actions,  interwoven  with  higher  intellectual 
abilities,  with  beautiful  and  sublime,  Kant 
considered  to  be  the  acts  of  learning.  These  three 
abilities  correspond  to  the  three  main  modes  of 
consciousness:  knowledge  (of  concepts),  feeling  (of 
correspondence  between  internal  concepts  and  outer 
world),  and  desire  (to  act).  While  Kant  devoted  a 
separate  book  to  each  ability,  they  ought  to  be 
combined  within  a  dynamical  system  constantly 
exercising  all  three  abilities  in  their  interaction. 

Cyberaesthetics  provides  the  mathematical 
apparatus  of  Kantian  theory  (Perlovsky,  1996; 
1997).  It  carries  Kantian  analysis  further:  it  is  a 
dynamical  system,  in  which  the  three  abilities 
identified  by  Kant  exist  in  the  process  of  constant 
interaction,  as  it  were  in  a  "vortex".  The  judgment 
evaluates  a  similarity  between  an  a  priori  concept- 
model  and  input  data  (sensory  data,  or  lower-level 
concepts).  A  high  judgment-emotion  activates  the 
learning-adaptation  action  modifying  the  concept- 
model.  And,  the  modified  model  changes  the 
judgment-emotion.  The  vortex  of  concepts- 
emotions-actions  proceeds  until  an  equilibrium  is 
reached:  from  a  split-second,  in  a  case  of  a  simple 
less-adaptive  sign,  it  extends  to  generations  in  case 
of  learning  new  cultural  paradigms.  The  mind  is  a 
system  of  interacting  vortexes,  competing  for  the 
evidence  in  the  manifold  of  matter.  Each  vortex  is  a 
process  of  semiosis,  a  process  of  the  dynamical 
formation  of  a  symbol.  Perception  of  beauty  is 
related  to  the  ability  of  complex  adaptive  symbolic 
systems  to  perceive  a  similarity  between  the  internal 
a  priori  model  and  outer  world  beyond  any  specific 
purpose  and  related  exclusively  with  a  will  for 
learning,  that  is  a  will  for  improving  ourselves,  our 
internal  representations  of  the  reality. 

"The  connection  of  ideas  does  not  imply  the 
relation  of  cause  and  effect  but  only  a  mark  or  sign 
with  the  thing  signified.  The  fire  which  I  see  is  not 
the  cause  of  the  pain  I  suffer  upon  approaching  it, 
but  the  mark  that  forewarns  me  of  it  (Berkeley). 
That,  which  Berkeley  calls  here  "the  mark"  is  an 
internal  sign.  An  internal,  because  it  is  inside  of  my 
body,  inside  myself.  It  is  a  sign  of  the  fire,  which 
obviously  does  not  burn.  It  is  a  semiotical  aspect  of 
experience.  An  aspect  that  was  isolated,  emphasized 
and  analyzed  by  semiotics.  We  see  the  sign  of  fire, 
because  we  look  inside  of  ourselves,  and  we  look 
not  with  eyes,  but  with  all  our  body,  with  its 
biophysical  memory,  which  all  are  a  part  of  our 
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internal  concept-model  of  fire.  When  we  imagine 
something,  like  approaching  fire,  we  do  it  with  all 
our  body. 

This  is  why,  the  Freudist  interpretation  of  the 
Tolstoy  story  is  not  illegitimate.  This  is  an 
interpretation  made  by  the  body.  Therefore,  this 
interpretation  could  be  much  more  dangerous  for  its 
creator,  than  for  Lev  Tolstoy.  But  what  could  be 
said  about  our  interpretation,  which  seems  to  us  to 
be  so  humane,  sublime,  spiritual,  etc.?  It  is  also  a 
body's  interpretation.  The  covered  with  blood  body 
of  the  soldier  associated  in  young  man's  mind  with 
his  own  body.  He  saw  himself  in  the  place  of  the 
soldier,  and  because  he  was  not  a  sadist  nor 
masochist,  he  was  terrified.  His  actions  are 
affective,  that  is  undifferentiated,  highly  single- 
minded.  Multi-minded  functioning,  differentiation 
of  thoughts  and  feelings  is  an  aesthetical 
phenomenon,  belonging  to  the  realm  of  higher 
emotions  that  do  not  deterministically  control  our 
behavior  through  ancient  affective  systems,  but  are 
an  integral  part  of  the  intellect.  A  scientific  approach 
to  nature  is  an  aesthetic  one,  since  it  assumes 
differentiation,  multi-minded  functioning,  and 
multiple  methods.  A  human  being  is  capable  of 
avoiding  determinacy.  This  capability  is  a 
manifestation  of  a  cyberaesthetic  "energy",  a  psychic 
energy  that  is  non-deterministically  concentrated.  An 
adaptive  logic  of  cyberaesthetics  assumes  freedom 
with  respect  to  space-time,  freedom  of 
consciousness  towards  itself.  The  difference 
between  a  symbol-image  and  non-adaptive  sign  is 
that  the  image  is  fuzzy,  it  "knows"  of  its  nonidentity 
with  itself,  which  is  manifest  in  its  adaptivity,  an 
inherent  property  of  the  dynamical  symbol  of 
cyberaesthetics.  A  bee  functions  in  an  image-reality 
which  is  visible  to  our  mind,  but  invisible  to  the  bee. 
A  bee  is  a  pure  sign  that  is  a  pure  text,  the  writing 
itself,  while  a  human  being,  in  addition,  is  also  an 
author  of  the  writing;  a  human  exists  as  a  will  that 
moves  the  hand. 

Professor  Sebeok  quotes  an  important  citation 
from  Peirce:  "  ...thought  is  not  necessarily  connected 
with  a  [human]  brain.  It  appears  in  the  work  of 
bees,  of  crystals,  and  throughout  the  purely  physical 
world...  Not  only  is  thought  in  the  organic  world, 
but  it  develops  there  [and]  there  cannot  be  thought 
without  Signs"  [Peirce,  1935].  Evidently,  here 
Peirce  talks  about  the  universal  organizing  principle, 
the  universal  spirit  of  Hegel  that  through  the 
Schellingian  Absolute  ascends  to  the  philosophy  of 
Jacob  Behme.  A  human  being  is  a  manifestation  of 
this  universal  spirit  like  a  bee  or  a  crystal.  But,  what 


is  not  touched  upon  by  Peirce  is  a  human  being  as  an 
internal  catastrophe  of  the  spirit,  an  internal  freedom 
in  which  the  universal  spirit  does  not  know  itself. 
This  is  the  "space"  or  realm  that  we  call  aesthetics. 

Origins  of  aesthetics  and  semiotics  are 
complementary.  Aesthetical  realm  is  a  thought 
without  sign.  Semiotical  realm  is  a  signs  without 
thought.  In  this  regard,  zoosemiotics  differs  from 
antroposemiotics  just  quantitatively,  by  the  number 
and  configurations  of  signs.  The  purpose  of  the 
semiotical  is  to  point  out  its  aesthetical  meaning,  its 
self-other,  where  signness  disappears  yielding  to 
functionality.  The  meaning  is  a  function  of  the  sign. 
But,  as  a  function,  the  sign  is  an  element  of  the 
aesthetical  realm,  it  virtualizes,  dissolves  in  the 
image.  Cybernetical  aesthetics  is  a  science  of  control 
of  the  sign-realm,  a  control  that  transforms  signness 
into  the  psychic  energy  of  combined  sign  and  its 
apriority,  into  the  vortex  of  thought. 

Here  we  would  remind  you  of  the  most 
idiosyncratic  zoosemiotical  example  ever  discussed 
by  philosophers.  We  are  talking  about  the  Ludwig 
Feierbach's  cat,  which  instead  of  scratching  out  its 
own  eyes  still  jumps  at  the  mouse  and  eats  it  up.  In 
this  way  Feierbach  argued  against  berkelians, 
hegelians,  and  other  idealists  whom  he  broke  off 
with.  From  the  cyberaesthetical  point  of  view, 
jumping  at  the  mouse,  the  cat  indeed  scratching  out 
its  own  eyes,  the  very  same  eyes  that  it  saw  the 
mouse  with.  The  cat,  as  it  were,  puts  in  itself  a  new 
eye,  it  changes  the  space  of  its  functioning, 
transforms  it,  for  now  it  obeys  a  will  of  the  third,  of 
some  true  interpretant.  Within  the  will  of  this 
interpretant,  the  cat  and  the  mouse  are  just  but 
elements  of  some  whole  system,  elements  whose 
continuous  transformations  are  the  conditions  of  the 
system's  existence.  To  cut  it  short,  the  mouse  is  a 
self-interpretation  of  the  cat.  Without  the  "mouse", 
the  cat  would  be  something  else.  We  came  to  the 
notion  of  a  system,  which  can  not  be  understood  as  a 
sum  of  individual  objects.  The  understanding  can 
not  proceed  through  the  differentiation  alone,  but 
requires  a  next  step,  the  synthesis.  Similarly,  every 
instinct  is  a  manifestation  of  a  cyberaesthetical 
phenomenon,  a  sign  expression  of  the  beyond-sign 
reality. 

In  professor's  Sebeok  book  there  is  a  story  of 
the  honey-guide  bird,  which  guides  people  to  the 
bee-hive  with  honey.  It  is  not  clear,  what  causes  this 
bird  kindness,  for  seemingly,  it  gains  nothing.  It 
does  not  eat  honey,  nor  has  any  use  for  wax, 
because  it  does  not  build  nests:  it  put  its  chicks  into 
others  nests.  While  telling  us  about  this  one  more 


example  of  zoo-utilization  of  signs,  professor 
Sebeok's  role  is  that  of  a  semioticist.  But,  when 
asking:  what  is  behind  this  bird's  actions?  -  he  leaves 
the  realm  of  semiotical  and  enters  the 
cyberaesthetical,  the  realm  of  meaning,  where  the 
bird  as  such  does  not  exist.  Semiotics  is,  as  it  were, 
a  Wilson  camera  collecting  the  traces  of  the 
aesthetical.  Let  us  assume  some  interpretation,  for 
example,  that  the  bird  "put"  a  human  into  the 
beehive,  similar  to  putting  its  own  progeny  into 
other's  nests.  So,  if  we  assume  that  all  the  altruism 
of  the  bird  is  dictated  by  its  inborn  "meanness",  we 
enter  the  realm  of  aesthetical.  The  mechanism  of  our 
analysis  is  the  very  cybernetical  of  cyberaesthetical, 
this  is  our  attempt  to  bring  the  elements  into 
interaction  in  such  a  way  that  the  purposiveness  of 
one  is  seen  through  the  purposiveness  of  the  other. 
This  is  the  principle  of  any  discovery,  an  artistic  or 
scientific  one. 

From  the  semiotical  point  of  view  the  bird  is  a 
sign  vehicle  that  is  waiting  for  its  interpretant.  But 
what  the  interpretant  is  about  to  interpret?  If  an 
object  exists  before  the  interpretation,  this  means  that 
it  has  already  been  interpreted.  So,  purely 
semiotically,  the  bird  itself  is  not  a  puzzle.  It  is  a 
puzzle  aesthetically,  for  in  this  way,  the  bird  indeed 
is  "a  trace  of  a  bird",  a  trace  of  a  yet  unknown 
situation,  in  which  the  bird-human-beehive  are 
combined  in  some  whole,  which  configuration  is  not 
yet  manifest.  An  assumed  self-regulation  of  these 
elements  we  construct  so  that  the  configuration  is 
consistent,  isomorphic  with  our  experience.  This  is 
the  "mechanism"  of  any  scientific  discovery,  any 
discovery  whatsoever.  Thus,  the  growth  of  our 
knowledge  in  some  way  is  similar  to  that  of  a 
crystal,  it  takes  care  of  the  homogeneity  of  its 
structure,  reminding  us  of  the  universal  spirit. 

A  very  interesting  story  is  told  by  professor 
Sebeok  about  a  difficult  life  of  male-fireflies  of  the 
genus  Hilara  sartor.  A  male  kills  an  insect  and  with 
this  "wedding  gift"  flies  to  a  female,  who  eat  the  gift 
during  the  intercourse.  If  it  eats  up  the  entire  gift  too 
fast,  it  starts  eating  its  beloved  one  right  during  the 
intercourse.  This  love  story  may  seem  repugnant  to 
some  of  us.  Especially  to  those,  whose  imagination, 
like  that  of  the  young  man  in  the  Tolstoy  story,  let's 
us  identify  our  body  with  the  body  of  a  male  Hilara 
sartor.  This  feeling  of  disgust  is  our  affective  bodily 
reaction.  And,  what  about  the  beauty  that,  according 
to  Kant  is  related  to  the  perceived  purposiveness  of 
the  object?  Does  Kantian  theory  support  such  a 
relativism  of  beauty?  The  behavior  of  the  Hilara 
sartor  female  is  quite  purposive:  she  takes  care  of 


producing  offspring.  Then,  why  it  might  be 
impossible  to  some  of  us  to  perceive  beauty  in  this 
case?  Kant  explains  that  beauty  is  related  not  to  any 
purposiveness,  but  to  the  purposiveness  of  the  object 
with  respect  to  our  internal  models,  independent  of 
any  utilitarian  goal.  Apparently,  our  models  of  love 
are  more  purposive  to  us  than  what  we  see  among 
Hilara  sartor.  Even  so  our  models  are  not  as  directly 
utilitarian  as  theirs,  some  of  us  do  not  feel  that  our 
models  can  improve  by  watching  them.  So,  this 
genus  does  not  look  beautiful  to  some  of  us.  But, 
are  we  really  that  different  from  Hilara  sartor? 
Aren't  their  relationships  all  too  similar  to  human 
relationships?  "Dowryless"  husbands  and  wives 
often  feel  quite  bad,  especially  after  the  romantic  part 
of  relationships  is  over.  If  the  husband  does  not 
bring  money  home,  the  wife  "eats  him"  in  many 
different  ways.  If  the  Hilara  sartor  male  not  always 
can  buy  himself  off  from  sexual  appetites  of  his 
partner,  so  among  human  this  is  also  quite  possible. 
The  ideal  of  beauty  does  not  always  matches  the 
reality,  so  some  of  us  may  enjoy  observing  Hilara 
sartor  relationships,  if  not  for  improving  our  ideals 
of  beauty,  then  at  least  for  improving  our  model 
adaptivity  to  some  immediate  situations  in  life. 

The  above  analysis  is  an  aesthetical  one,  not 
semiotical.  An  art  of  comparison,  discovering 
isomorphisms'among  phenomena,  is  a  dramatic  one, 
in  that  semiotic  signs,  passive  objects  which  are 
predestined  to  remain  objects  of  interpretation, 
acquire  a  tendency  to  transform  into  another 
category,  to  become  a  mechanism  of  cognition, 
mechanism  of  isomorphism,  mechanism  of 
transformation.  To  achieve  this,  a  sign  has  to  relieve 
a  transfiguration  first,  it  has  to  dye  and  risen. 
Relative  to  the  semiotic  space,  the  aesthetic  one  is 
like  a  black  hole,  an  annihilating  antimatter.  Our 
taboo  are  borders  keeping  us  from  getting  too  close 
to  these  black  holes,  to  its  annihilating  nature.  Every 
science  is  filled  with  these  taboo  as  well  as  any 
sphere  of  human  activity. 

Ants  of  genus  mirmekaphilus  confuse  the  rear 
side  of  other  ants  with  the  front  part  of  their  own 
kind,  and  make  an  intercourse  "erroneously".  "The 
multiple  resemblance  between  this  icon  and  the 
object  for  which  it  stands  are  so  striking,  subtle,  and 
precisely  modulated  that  they  can  hardly  be  explained 
away  as  an  evolutionary  coincidence"  [14].  If  an 
evolution  follows  a  definite  logic,  there  would  be  no 
room  for  the  black  holes  of  aesthetical  in  this 
semiotic  kingdom.  There  would  be  no  freedom,  no 
catastrophicity.  Professor  Sebeok  explains  indecent 
behavior  of  ants  by  confusion  due  to  resemblance  in 
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appearances  of  various  genus,  but  then,  how  to 
explain  a  similar  behavior  of  more  advanced  species, 
when  there  is  no  confusion  at  the  level  of  sign- 
appearances?  If  there  is  a  "confusion",  it  occurs  in 
aesthetical  space,  where  the  meaning  of  signs  is 
formed  and  where  a  familiar  sign  may  acquire  a  new 
meaning.  Semiotics  emphasizes  apriority, 
considering  it  as  an  imperative,  which  can  only  be 
violated  erroneously:  if  not  for  a  confusion  of 
appearances,  the  ant  would  be  more  decent. 
Cyberaesthetics  equally  supports  a  very  different 
point  of  view,  natural  imperatives  that  evolved  in  the 
course  of  evolution  are  of  a  relative,  adaptive  nature. 
And,  in  particular,  sexual  drives  are  directed  at  more 
than  just  one  definite  utilitarian  goal  of  gene 
propagation.  They  also  could  be  affine  to  a  wider 
purposiveness  having  no  specific  utilitarian  goal,  and 
they  are  hostages  of  camouflages  and 
countermeasures  evolved  throughout  evolution. 
And,  sexual  drive  may  burn  the  organism  in  the  fire 
of  isomorphisms. 

Returning  to  the  Tolstoy's  story,  we  conclude 
that  both,  "humanistic"  and  "Freudian" 
interpretations  contain  elements  of  truth.  Still 
another  interpretation  seems  more  interesting:  the 
young  man  identified  his  relationships  to  his  beloved 
one  and  her  farther,  and  being  disappointed  in  one, 
he  lost  his  love  to  the  other.  Thus,  a  much  more 
refined  isomorphism  of  sublime  ideas  seems  to  take 
place  in  the  aesthetical  space  of  the  story.  When  a 
"Freudist",  looking  at  a  smoking  man,  sees  "the 
marks"  that  dissolve  the  smoker,  his  perception  is 
short-circuited  through  the  "marking"  semiotic 
system  directly  to  the  bodily  functions.  Similarly  a 
building  perceives  bricks  it  is  build  of.  We  can  not 
but  quote  Carl  Jung:  "I  remember  one  Indian  coming 
up  softly  behind  me  while  I  was  looking  at  the 
mountain  over  the  pueblo,  and  saying  quite  suddenly 
in  my  ear,  'Don't  you  think  all  life  is  coming  from 
the  mountain?'  It  was  just  in  that  way  that  Freud 
talked  about  sexuality"  [Schoenl,  1996]. 

A  widow  of  one  famous  theologist  was  in  rage, 
when  after  the  death  of  her  husband,  instead  of 
uncompleted  manuscripts,  she  found  in  his  desk 
pages  teared  from  erotic  magazines.  Naive  woman 
was  under  a  spell  of  an  opposite  extreme,  she 
thought  that  all  life  is  coming  not  from  the  mountain 
but  from  the  Mountain  sermon.  Erotic  interests  of 
her  husband  she  perceived  as  an  error  commensurate 
with  an  ant,  not  a  great  theologist.  It  is  a  purely 
semiotical  error,  that  a  great  mind  can  not  mix  up 
sinful  with  sacred.  In  order  to  overcome  the 
gravitation  of  a  zoosemiotic  realm,  one  need  all  its 
energy.  Cyberaesthetics  is  a  philosophy  and  science 


of  combining  emotional  with  conceptual,  a  theory  of 
control  of  the  invisible  energy  of  the  visible. 
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THE  HOLE  IN  THE  SYSTEM 
Floyd  MerrcJl 
Purdue  University 

as  if  it  were 


Abstract  Physicist  John  Archibald  Wheeler's  concept 
of  the  "boundary  of  boundaries"  that  lies  at  the  heart  of 
what  customarily  goes  as  our  physical  world  finds 
resonance  with  Charles  S.  Peirce's  notion  of  semiosis. 
which  includes  the  categories  of  Firstness,  Secondness, 
and  Thirdness.  possible,  actual,  and  potential  signs,  the 
sign  components  (reprcscntamcn,  object,  interpretant), 
and  the  oomplementaritv  between  sign 
overdetermination  and  sign  underdetermination. 

If  Charles  S.  Peirce's  semiosis  is  process,  figs, 
signs  becoming  signs,  the  question  inevitably  surfaces: 
What  where,  when,  how,  are  signs,  when  considering 
body  as  well  as  mind,  self  as  well  as  other,  sign  as  well 
as  interpreter,  individual  as  well  as  his/her  community 
and  the  world  'out  there' ? 

In  desperation,  wc  may  happen  to  run  onto  that 
Indian  counterpart  to  Heraclitus.  Nagarjuna,  who 
counsels  that  body,  mind,  self,  world,  are  signs  all.  The 
query  To  sign  or  not  to  sign"  is  thus  no  query  at  all,  for 
all  that  is,  is  sign  Which  is  to  say  that  (1)  the  sign  i£. 
but  also,  (2)  it  is  not,  for  there  is  no  nonsign  against 
which  to  gauge  it  as  a  sign;  in  such  case  (3)  it  is  both 
sign  and  nonsign,  but  if  that  is  the  case,  then  (4)  it  is 
neither  sign  nor  nonsign.  And  yet,  if  we  arc  to  know 
signs,  all  four  of  these  possibilities  are  to  be  negated! 
Nagarjuna  brings  about  such  a  negation.  All  is  no  more 
than  appearance,  but  it  is  more  than  appearance,  for  it 
is,,  but  if  it  is,  then  it  is  not,  and  we  find  ourselves  on  the 
merry-go-round  once  again.  The  very  possibility  of 
negation  is  negated,  which  gives  rise  to  affirmation,  but 
since  there  is  nothing  to  affirm,  then  what  was  affirmed 
is  negated,  so  it  is  impossible  either  to  affirm  or  to 
negate,  which  is  to  say  that  there  is  bpjh  affirmation  and 
negation  and  at  the  same  time  there  is  neither 
affirmation  nor  negation.  In  other  words,  in  a  manner 
of  speaking  there  is  only  'emptiness'.  'Emptiness'  is 
not  a  matter  of  two  unthinkables  making  a  thinkable, 
much  as  two  Hegelian  negations  make  an  affirmation. 
No.  For,  there  is  nothing,  no-thing  that  is  cither  one 
thing  or  the  other  thing.  There  is  just  'emptiness'. 

Where  is  all  this  jabber  about  'emptiness' 
taking  us,  perhaps,  is  toward  what  physicist  John  A. 
Wheeler  dubs  the  'boundary  of  the  boundary',  which  is 
another  way  of  saying  'emptiness'  Yes,  the  secret  lies 
in  the  'boundary  of  the  boundary'  The  boundary, 
Wheeler  implies,  is  tantamount  to  the  square  root  of 
minus  one.  V-l,  the  function  of  which  is  found  in  logic, 
in  mathematics  and  chaos  theory,  in  computer 
engineering,  and  in  quantum  mechanics  and  relativity 

theory.  V-l  is  in  a  sense  everywhere  and  nowhere.  It  is 


emptiness'  realized,  paradoxically  The 
boundary  of  the  boundary,  the  sign  of  itself,  the  hinge 
that  allows  for  all  the  action:  it's  from  there  that  the 
whole  show  gushes  forth. 

Wheeler,  the  quantum  physicist,  tells  us  that  a 
boundary  at  its  simplest  best  and  at  its  confoundingly 
worst  begins  with  a  solitary  'direct  or  "oriented"  line—a 
one-dimensional  manifold'  that  has  for  its  boundary  'the 
starting  point  and  the  terminal  point,  both  zero- 
dimensional'  [7],  Such  a  boundary  is  a  hole  cut  out  of  a 
sheet  of  paper  with  a  pair  of  scissors.  It  consists  of  a 
two-dimensional  manifold,  the  cut  itself  defined  as  a 
one-dimensional  manifold.  So.  where  is  the  boundary? 
At  the  edge  of  the  paper  What  is  the  boundary,  that  is, 
of  what  does  it  consist?  Why.  nothing  at  all.  It  is  zero! 
Zero,  'whatever  the  point  at  which  wc  consider  that  line 
to  have  started,  that  is  also  the  point  at  which  the  line 
terminates'  [7],  A  debt  is  incurred  at  the  starting  point 
when  the  scissors  penetrate  the  two-dimensional 
manifold,  and  then  that  starting  point  is  annihilated, 
consumed,  eaten  up.  Which  is  to  say,  if  in  our  three- 
dimensional  world  we  were  to  carve  out  a  cube,  then, 
like  a  square  hole  marring  a  Flatlandcr's  two- 
dimensional  existence,  there  would  be  a  section  in  our 
world  that  would  not  hold  water,  or  anything  else  for 
that  matter.  Moreover,  our  cube  separating  our  space 
from  'nothingness',  from  'elsewhere',  would  be  a 
boundary  consisting  of  twelve  lines  (onc-dimcnsional 
boundaries)  and  six  planes  (two-dimensional 
boundaries)  making  up  a  three-dimensional  region,  and 
that  the  sum  of  these  boundaries  is  zero!  Nothing.  A 
nothing  that  separates  what  is  'herc-now'  from  mere 
'emptiness' 

Wheeler  demonstrates  his  point  with  the  image 
of  a  cube.  Each  of  its  six  faces  has  an  orientation  in 
three-dimensional  space.  Three  spaces  give  the  cube's 
position,  and  three  more  give  its  movement  along  its 
timc-linc,  its  world-line.  In  addition,  each  face  of  the 
cube  inherits  from  the  interior  an  orientation,  a  swirl,  a 
direction  of  spin  as  it  moves  along  its  time-line  toward 
the  future  (in  contemporary  physics  no  moving  body 
faces  forward  and  forward  only,  nor  is  it  Janus-faced, 
but  in  constant  oscillatory,  vibratory,  undulatory, 
gyration).,  The  one-dimensional  boundary  of  one  of  the 
faces  of  the  cube  is  given  a  swirl  by  a  line  and  an  arrow. 
There  is  a  start,  wherever  that  is,  and  an  ending,  but  the 
ending  ends  up  at  the  starting  point,  and  the  starting 
point  is  the  ending,  so  they  cancel  each  other  out:  zero 
dimensions.  But  isn't  the  line  a  boundary?  That's  one 
dimension,  isn't  it?  But  the  line  begins  eating  its  own 
tail,  finally  leaving  nothing,  Total  washout.  Kaput. 
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Actually,  wc  cut  the  sheet  of  paper,  a  two- 
dimensional  universe,  with  three-dimensional  scissors 
from  within  our  three-dimensional  vantage  point.  The 
question  is,  then;  What  kind  of  scissors  could  we 
possibly  use  to  cut  a  cube  out  of  our  own  three- 
dimensional  space  from  within  a  fourth  dimension? 
Why,  four-dimensional  scissors,  of  course.  A  big 
problem.  Where  do  we  get  them  and  how  can  we  use 
them?  We  don't,  and  we  cant,  short  of  our  playing  the 
impossible  role  of  a  Maxwell  Demon  and  a  Laplace 
Superobserver  all  wrapped  up  into  one.  So  wc  donl, 
and  we  cant.  Yet,  let's  see  how  far  we  can  go.  If  two- 
dimensional  space  is  subject  to  one-dimensional 
boundaries  and  three-dimensional  space  to  two- 
dimensional  boundaries,  then  it  stands  to  reason  that 
four-dimensional  space  must  be  subject  to  three- 
dimensional  boundaries.  Wheeler  depicts  the  whole 
shebang  as  a  'hypercube'.  The  four-dimensional  block 
of  spacctime  at  the  center  is  bounded  by  eight  three- 
dimensional  cubes  according  to  the  orientations  of  those 
surrounding  the  block.  Each  cube  sports  six  two- 
dimensional  faces,  and  each  face  is  bounded  by  four 
lines  Yet  in  the  final  analysis  it's  the  same  story:  these 
lines  cancel  themselves  out,  leaving  zero.  By  the  same 
token,  we  would  suppose  that  in  four-dimensional  space 
the  planes,  serving  as  boundaries  in  our  three- 
dimensional  space,  would  likewise  annihilate  each  other 
to  yield  'nothingness'. 

In  a  desperate  attempt  to  get  a  better  handle  on 
all  this,  let  us  turn  our  attention  to  Ludwig,  an 
imaginary  four-dimensional  demon,  Ludwig  would 
essentially  dwell  in  the  central  'hypercube'  with  a 
vantage  point  allowing  him  to  peer  into  our  innards  and 
view  a  stomach  ulcer,  a  kidney  stone,  some  lung  crud, 
and  whatever  other  personal  secrets  wc  have  stashed 
away  in  that  blemished  temple,  our  body.  How  can  wc 
possibly  identify  and  empathize  with  Ludwig?  We  can't, 
or  at  least  it  is  well  nigh  impossible  to  do  so.  So  we 
must  concede  that  while  prolonged  contemplation  of 
Ludwia  would  be  interesting  enough,  plenty  of  meat  for 
a  Ray  Bradbury  story,  wc  really  should  be  addressing 
ourselves  to  signs  of  the  sort  we  can  sink  our  teeth  into 
A  difficult  task,  for  if  we  ask  ourselves  what  there  is  in 
terms  of  signs,  our  signs,  ourselves  as  signs,  the  answer 
must  be:  Not  much,  really.  We  and  our  signs  are 
virtually  infinitesimal  in  comparison  to  the  entire  realm 
of  the  possible.  In  fact,  we  and  our  signs  are  hardly 
anything  at  all.  On  the  other  hand,  we  are  in  a  certain 
sense  virtually  everything.  We  begin  with  'emptiness', 
zero,  lhere  in  the  'center',  wherever  that  is.  Nothing 
preceded  it  and  it  has  no  boundaries.  Then  we  move  on 
to  the  equivalent  of  the  'empty  set',  a  sort  of  'noticed 
absence',  the  absence  of  something  that  might  have  been 
there  but  is  not,  with  certain  tentatively  defined 
boundaries,  or  of  something  that  was  never  there,  yet 


there  is  at  least  the  suggestion  of  boundaries,  since  the 
set  is  'empty'-there  is  some  domain  that  can  possibly  be 
filled  by  something. 

Then  we  move  on  to  the  first  sign  consisting  of 
the  Firstness  (qualisign,  quality,  mere  feeling  without 
consciousness  of  the  feeling  as  such)  of  the 
representamen  (sign  manifestation)  of  the  semiotic 
pbjgct,  and  of  the  interpTetant  (roughly,  the  sign's 
meaning),  This  consists  of  R-Q-J  relations  in  tripod 
fashion  about  the  'axis'  or  'node'  of  'emptiness'  such 
that  R  is  related  to  O  in  such  a  manner  in  which  they  are 
both  brought  together  by  the  mediation  of  I  at  the  same 
time  that  I  is  brought  into  relation  with  g.  and  Q  in  the 
same  way  that  they  are  brought  into  relation  with  it. 
The  relations  are  properly  triadic,  for  each  pair  of  terms 
depends  upon  the  existence  of  the  third  term  through  the 
existence  of  the  'node'.  We  have  moved  somewhere 
along  the  infinitesimal  thin  plane  within  that  fathomless 
cube,  our  four-dimensional  spacetime  continuum.  But 
whatever  this  most  elementary  sign  is.  it  is  not  for  us, 
not  yet  at  least.  That  is,  we  are  not  yet  conscious  of  it  as 
sentient  and  self-conscious  semiotic  agents  poised  and 
ready  to  puj,  whatever  there  is  into  the  service  of  the 
semiosic  stream  within  which  we  happen  to  find 
ourselves  at  the  moment.  Such  consciousness  gf  must 
await  further  sign  development.  So  we  move  on, 
somewhere  within  the  vast  expanse  of  the  cube,  Then 
we  become  aware  of  the  sign  as  something  other,  and  we 
have  gravitated  toward  the  Secondness  of  the  sign  as 
something  'out  there'  related  to  something  that  is  the 
object  of  our  attention  and  that  it  must  have  some 
meaning  of  some  sort  or  other.  This  relation  between 
ourselves  and  the  sign  is  at  this  moment  no  more  than 
Secondness.  ourselves  in  contrast  to  some  other.  We  see 
the  sign,  and  sense  that  it  must  be  related  to  something 
else  and  that  that  relation  in  its  relation  to  us  must  bring 
about  the  emergence  of  some  meaning,  some 
intcycr&eJant  Then,  an  idea  comes  to  mind,  and  we  arc 
in  Thirdness.  and  'Lo  and  behold!',  there  is  a  world  'out 
there'  awaiting  our  interaction  and  participation  with  it 
in  its  and  our  self-organizing  unfolding.  Then  wc 
exercise  a  move  toward  more  developed  signs:  language 
eventually  comes  issues  forth,  then  dialogue,  narrative, 
discourse. 

But  somewhere  along  the  line  our  signs  cannot 
help  becoming  self-reflexive,  a  mirror  of  themselves,  As 
mirroring  themselves,  ad  infinitum,  they  can  never  be 
final.  For,  in  Pcircc's  words: 

what  anything  really  is,  is  what  it  may  finally 
come  to  bo  known  to  be  in  the  ideal  state  of 
complete  information,  so  that  reality  depends 
on  the  ultimate  decision  of  the  community,... 
In  this  way,  the  existence  of  thought  now, 
depends  on  what  is  to  be  hereafter;  so  that  it 
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has  only  a  potential  existence,  dependent  on  the 

future  thought  of  the  community. 

The  individual  man,  since  his  separate 

existence  is  manifested  only  by  ignorance  and 

error,  so  far  as  he  is  anything  apart  from  his 

fellows,  and  from  what  he  and  they  are  to  be,  is 

only  a  negation  [4] 
The  process  of  knowing  is  ongoing,  future  oriented; 
knowledge  in  the  full  sense  is  always  in  the  future  and 
not  for  us  in  the  'here-now'  Consequently,  the 
'individual  man'  (or  'self.  Peirce's  terms),  is  4only  a 
negation'.  Onlv  a  negation?  Weil,  yes.  In  a  manner  of 
putting  it.  Perhaps  there's  no  other  way  adequately  of 
putting  it.  The  negation  is  the  negation  of  a  boundary, 
which  boundary  is  an  infinitesimal  line,  nothing, 
'nothingness'. 'emptiness'.  Sheer  emptiness 

So  we  are  back  to  that  again.  But  our  condition 
is  not  as  dire  as  it  might  appear.  We  might  hope  for 
Peirce's  envisioned  'ideal  state  of  complete  information' 
as  the  community  consensus  that  would  finally  be 
arrived  at  in  the  long  run  of  things.  However,  it  seems 
that  our  hopes  arc  shattered  anew,  for,  according  to 
Peirce,  that  community  consensus  will  never  come  to 
fruition  short  of  infinite  lime  and  an  infinitely  extended 
community  (i.e.  it  is  a  converging  series).  Yet  whatever 
knowledge  can  be  had,  however  limited,  and  whether 
consciously  and  willfully  expounded  or  tacitly  displayed, 
must  be  the  product  of  a  collective  effort,  through 
community  practice  and  dialogue. 

Speaking  of  language  and  dialogue,  let  us 
return  to  physics. 

BACK  TO  THE  FUTURE,  THEN? 

The  crisis  in  physics  during  the  first  quarter  of 
the  present  century  and  shortly  thereafter  brought  on  an 
unexpected  and  unwanted  crisis  of  language.  The  chief 
problem,  Hti  sen  berg  [2]  writes,  was  that  'no  language 
existed  in  which  one  could  speak  consistently  about  the 
new  situation'.  The  language  available  at  the  time  'was 
based  upon  the  old  concepts  of  time  and  space  and  this 
language  offered  the  only  unambiguous  means  of 
communication'.  It  was  a  matter,  Heisenberg  [2]  goes 
on,  of  waiting  'for  the  development  of  the  language, 
which  adjusts  itself  after  some  time  to  the  new 
situation'.  Heisenberg     sees     Niels  Bohr's 

'complementarity  principle'  as  encouragement  'to  use 
an  ambiguous  rather  than  an  unambiguous  language,  to 
use  the  classical  concepts  in  a  somewhat  vague  manner 
in  conformity  with  the  principle  of  uncertainty,  to  apply 
alternatively  different  classical  concepts  which  would 
lead  to  contradictions  if  used  simultaneously'  [2].  In 
other  words,  an  electron  could  be  described  either  as  a 
particle  or  a  wave  in  classical  language,  but  both 
descriptions  could  not  be  used  in  the  same  context 
without  falling  into  contradictions.  So  whatever  sort  of 


language  was  to  be  used  in  describing  this  new  situation, 
it  must  be  relatively  loose  and  vague. 

Heisenberg's  words  appear  strange,  coming 
from  a  physicist.  But  when  compared  to  the  way 
language  is  actually  used,  they  are  as  natural  as  can  be. 
Peirce,  in  this  vein,  was  also  an  ardent  proponent  of  the 
need  for  vague  and  ambiguous  signs  (of  Firstness)  as 
well  as  logically  precise  language  (of  Secondness  and 
Thirdncss).  For  this  reason  he  promised-though  he 
never  delivered  the  finished  amcle-a  'logic  of 
vagueness'  as  an  alternative  to  the  classical  principles  of 
identity,  noncontradiction,  and  the  excluded-middle.  A 
vague  and  ambiguous  language  contains,  within  itself, 
contradictory  and  incompatible  principles.  Such  a 
language  offering  the  possibility  of  various  alternatives, 
some  of  them  mutually  exclusive,  can  be  labeled 
overdetermmed,  In  order  to  talk  coherently,  we  must 
attempt  to  differentiate  between  incompatibles,  saying 
either  one  thing  ox  the  other,  but  not  both  in  the  same 
breath.  But  if  something  contains  the  possibility  of 
being  two  or  more  distinct  things,  when  we  can  see  it 
and  say  it  as  now  one  thing,  figw,  the  other  thing  The 
two  seeingg  and  savings  are  alternatives,  and  either 
might  be  effective  in  one  context  while  the  other  is  more 
effective  in  another  context:  differem  alternatives 
actualized  at  different  times  and  different  places,  all  of 
them  construed  to  be  valid  in  their  own  right,  are 
characteristic  of  an  underdetermined  system. 

Wittgenstein's  rabbit-duck  drawing  is 
ambiguous  in  terms  of  its  Secondness:  it  is  actualizes  as 
generally  either  one  thing  pj  the  other.  Within  the 
sphere  of  Firstness,  however,  it  is  vague,  both  the  one 
and  the  other,  though  it  remains  as  no  more  than  a 
possibility:  it  is  ovcrdetcrmined,  Within  the  sphere  of 
Thirdness,  on  the  other  hand,  it  contains  the  potentiality 
for  actualization  as  neither  the  one  nor  the  other  but 
something  else,  in  fact,  many  possible  other  things.  It 
can  be  imagined  as  the  map  of  an  island,  it  can  be  a 
masked  person  at  the  Mardi  Gras,  it  can  be  a  strangely 
shaped  cloud,  or  it  can  be  just  squiggles.  In  this  sense  it 
is  underdetermined.  According  to  Bohr 
complementarity,  an  electron  behaves  like  a  particle 
under  one  set  of  circumstances  and  like  a  wave  under 
another  set  of  circumstances.  Expressing  the  electron  as 
now  one  thing,  now  another,  is  revelation  of  language's 
underdetermination.  Of  course,  it  is  from  another  view 
both  a  particle  ajjd.  a  wave  (overdetermined).  though  it 
cannot  manifest  both  of  those  characteristics  in 
simultancity-nor  could  we  be  aware  of  both 
characteristics  at  once  even  if  they  could  be  so 
manifested.  We  have  also  seen  this  use  of  language  in 
the  above  comments  by  Heisenberg,  especially  in  his 
observation  that  'the  situation  of  complementarity  is  not 
confined  to  the  atomistic  world  alone'  but  is  also  found 
in  the  arts  and  in  everyday  life  situations  12], 
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Actually,  wc  cut  the  sheet  of  paper,  a  two- 
dimensional  universe,  with  three-dimensional  scissors 
from  within  our  three-dimensional  vantage  point.  The 
question  is,  then:  What  kind  of  scissors  could  we 
possibly  use  to  cut  a  cube  out  of  our  own  three- 
dimensional  space  from  within  a  fourth  dimension? 
Why,  four-dimensional  scissors,  of  course.  A  big 
problem.  Where  do  we  get  them  and  how  can  we  use 
them?  We  don't,  and  we  can't,  short  of  our  playing  the 
impossible  role  of  a  Maxwell  Demon  and  a  Laplace 
Supcrobserver  all  wrapped  up  into  one.  So  we  don't, 
and  we  cant.  Yet,  let's  see  how  far  we  can  go.  If  two- 
dimensional  space  is  subject  to  one-dimensional 
boundaries  and  three-dimensional  space  to  two- 
dimensional  boundaries,  then  it  stands  to  reason  that 
four-dimensional  space  must  be  subject  to  three- 
dimensional  boundaries.  Wheeler  depicts  the  whole 
shebang  as  a  'hypercube'  The  four -dimensional  block 
of  spacetime  at  the  center  is  bounded  by  eight  three- 
dimensional  cubes  according  to  the  orientations  of  those 
surrounding  the  block.  Each  cube  sports  six  two* 
dimensional  faces,  and  each  face  is  bounded  by  four 
lines.  Yet  in  the  final  analysis  it's  the  same  story:  these 
lines  cancel  themselves  out,  leaving  zero.  By  the  same 
token,  we  would  suppose  that  in  four-dimensional  space 
the  planes,  serving  as  boundaries  in  our  three- 
dimensional  space,  would  likewise  annihilate  each  other 
to  yield  'nothingness'. 

In  a  desperate  attempt  to  get  a  better  handle  on 
all  this,  let  us  turn  our  attention  to  Ludwig.  an 
imaginary  four-dimensional  demon.  Ludwig  would 
essentially  dwell  in  the  central  'hypercube'  with  a 
vantage  point  allowing  him  to  peer  into  our  innards  and 
view  a  stomach  ulcer,  a  kidney  stone,  some  lung  crud, 
and  whatever  other  personal  secrets  we  have  stashed 
away  in  that  blemished  temple,  our  body.  How  can  vvc 
possibly  identify  and  empathize  with  Ludwig?  We  cant, 
or  at  least  it  is  well  nigh  impossible  to  do  so.  So  we 
must  concede  that  while  prolonged  contemplation  of 
Ludwig  would  be  interesting  enough,  plenty  of  meat  for 
a  Ray  Bradbury  story,  we  really  should  be  addressing 
ourselves  to  signs  of  the  sort  we  can  sink  our  teeth  into. 
A  difficult  task,  for  if  we  ask  ourselves  what  there  is  in 
terms  of  signs,  our  signs,  ourselves  as  signs,  the  answer 
must  be:  Not  much,  really.  We  and  our  signs  are 
virtually  infinitesimal  in  comparison  to  the  entire  realm 
of  the  possible.  In  fact,  wc  and  our  signs  arc  hardly 
anything  at  all.  On  the  other  hand,  we  are  in  a  certain 
sense  virtually  everything.  We  begin  with  'emptiness', 
zero,  there  in  the  'center',  wherever  that  is  Nothing 
preceded  it  and  it  has  no  boundaries.  Then  we  move  on 
to  the  equivalent  of  the  'empty  set',  a  son  of  'noticed 
absence',  the  absence  of  something  that  might  have  been 
there  but  is  not,  with  certain  tentatively  defined 
boundaries,  or  of  something  that  was  never  there,  yet 


there  is  at  least  the  suggestion  of  boundaries,  since  the 
set  is  'empty '-there  is  some  domain  that  can  possibly  be 
filled  by  something 

Then  we  move  on  to  the  first  sign  consisting  of 
the  Firstness  (qualisign,  quality,  mere  feeling  without 
consciousness  of  the  feeling  as  such)  of  the 
represeniamen  (sign  rnanifestation)  of  the  scroiotic 
gbjgcj,  and  of  the  interpretant  (roughly,  the  sign's 
meaning).  This  consists  of  &-Q-I  relations  in  tripod 
fashion  about  the  'axis'  or  'node'  of  'emptiness'  such 
that  g.  is  related  to  Q  in  such  a  manner  in  which  they  are 
both  brought  together  by  the  mediation  of  I  at  the  same 
time  that  I  is  brought  into  relation  with  R  and  Q_  in  the 
same  way  that  they  are  brought  into  relation  with  it. 
The  relations  are  properly  txiadic,  for  each  pair  of  terms 
depends  upon  the  existence  of  the  third  term  through  the 
existence  of  the  'node'  We  have  moved  somewhere 
along  the  infinitesimal  thin  plane  within  that  fathomless 
cube,  our  four-dimensional  spacetime  continuum.  But 
whatever  this  most  elementary  sign  is,  it  is  not  for  us, 
not  yet  at  least.  That  is,  we  are  not  yet  conscious  of  it  as 
sentient  and  self-conscious  semiotic  agents  poised  and 
ready  to  pu|  whatever  there  is  into  the  service  of _  the 
semiosic_ stream  within  which  we  happen  to  find 
ourselves  at  the  moment.  Such  consciousness  of  must 
await  further  sign  development.  So  we  move  on, 
somewhere  within  the  vast  expanse  of  the  cube.  Then 
we  become  aware  of  the  sign  as  something  other,  and  we 
have  gravitated  toward  the  Secondness  of  the  sign  as 
something  'out  there'  related  to  something  that  is  the 
object  of  our  attention  and  that  it  must  have  some 
meaning  of  some  sort  or  other.  This  relation  between 
ourselves  and  the  sign  is  at  this  moment  no  more  than 
Secondness.  ourselves  in  contrast  to  some  other.  We  see 
the  sign,  and  sense  that  it  must  be  related  to  something 
else  and  that  that  relation  in  its  relation  to  us  must  bring 
about  the  emergence  of  some  meaning,  some 
lntexptetant.  Then,  an  idea  comes  to  mind,  and  we  are 
in  Thirdness.  and  'Lo  and  behold)',  there  is  a  world  'out 
there'  awaiting  our  interaction  and  participation  with  it 
in  its  and  our  self-organizing  unfolding.  Then  we 
exercise  a  move  toward  more  developed  signs,  language 
eventually  comes  issues  forth,  then  dialogue,  narrative, 
discourse. 

But  somewhere  along  the  line  our  signs  cannot 
help  becoming  self-reflexive,  a  mirror  of  themselves.  As 
mirroring  themselves,  ad  infinitum,  they  can  never  be 
final.  For,  in  Peirce's  words: 

what  anything  really  is,  is  what  it  may  finally 
come  to  be  known  to  be  in  the  ideal  state  of 
complete  information,  so  that  reality  depends 
on  the  ultimate  decision  of  the  community;... 
In  this  way,  the  existence  of  thought  now, 
depends  on  what  is  to  be  hereafter;  so  that  it 
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has  only  a  potential  existence,  dependent  on  the 

future  thought  of  the  community. 

The  individual  man,  since  his  separate 

existence  is  manifested  only  by  ignorance  and 

error,  so  far  as  he  is  anything  apart  from  his 

fellows,  and  from  what  he  and  they  are  to  be,  is 

only  a  negation  [4  J 
The  process  of  knowing  is  ongoing,  future  oriented; 
knowledge  in  the  full  sense  is  always  in  the  future  and 
not  for  us  in  the  lhere-now'.  Consequently,  the 
'individual  man'  (or  'self,  Peirce's  terms),  is  'only  a 
negation'  Only  a  negation?  Well,  yes.  In  a  manner  of 
putting  it.  Perhaps  there's  no  other  way  adequately  of 
putting  it.  The  negation  is  the  negation  of  a  boundary, 
which  boundary  is  an  infinitesimal  line,  nothing, 
'nothingness',  'emptiness'  Sheer  emptiness 

So  we  are  back  to  that  again.  But  our  condition 
is  not  as  dire  as  it  might  appear.  We  might  hope  for 
Peirce's  envisioned  'ideal  state  of  complete  information' 
as  the  community  consensus  that  would  finally  be 
arrived  at  in  the  long  run  of  things.  However,  it  seems 
that  our  hopes  are  shattered  anew,  for,  according  to 
Peirce,  that  community-  consensus  will  never  come  to 
fruition  short  of  infinite  time  and  an  infinitely  extended 
community  (i.e.  it  is  a  converging  series).  Yet  whatever 
knowledge  can  be  had,  however  limited,  and  whether 
consciously  and  willfully  expounded  or  tacitly  displayed, 
must  be  the  product  of  a  collective  effort,  through 
community  practice  and  dialogue. 

Speaking  of  language  and  dialogue,  let  us 
return  to  physics. 

BACK  TO  THE  FUTURE,  THEN? 

The  crisis  in  physics  during  the  first  quarter  of 
the  present  century  and  shortly  thereafter  brought  on  an 
unexpected  and  unwanted  crisis  of  language.  The  chief 
problem,  Heisenberg  [2]  writes,  was  that  'no  language 
existed  in  which  one  could  speak  consistently  about  the 
new  situation'.  The  language  available  at  the  time  'was 
based  upon  the  old  concepts  of  time  and  space  and  this 
language  offered  the  only  unambiguous  means  of 
communication',  It  was  a  matter,  Heisenberg  [2]  goes 
on,  of  waiting  'for  the  development  of  the  language, 
which  adjusts  itself  after  some  time  to  the  new 
situation'.  Heisenberg  sees  Niels  Bohr's 
'complementarity  principle'  as  encouragement  'to  use 
an  ambiguous  rather  than  an  unambiguous  language,  to 
use  the  classical  concepts  in  a  somewhat  vague  manner 
in  conformity  with  the  principle  of  uncertainty,  to  apply 
alternatively  different  classical  concepts  which  would 
lead  to  contradictions  if  used  simultaneously'  (2].  In 
other  words,  an  electron  could  be  described  cither  as  a 
particle  or  a  wave  in  classical  language,  but  both 
descriptions  could  not  be  used  in  the  same  context 
without  falling  into  contradictions.  So  whatever  sort  of 


language  was  to  be  used  in  describing  this  new  situation, 
it  must  be  relatively  loose  and  vague. 

Hcisenberg's  words  appear  strange,  coming 
from  a  physicist.  But  when  compared  to  the  way 
language  is  actually  used,  they  are  as  natural  as  can  be. 
Peirce,  in  this  vein,  was  also  an  ardent  proponent  of  the 
need  for  vague  and  ambiguous  signs  (of  Firstness)  as 
well  as  logically  precise  language  (of  Secondness  and 
Thirdness).  For  this  reason  he  promised-though  he 
never  delivered  the  finished  article— a  'logic  of 
vagueness'  as  an  alternative  to  the  classical  principles  of 
identity,  noncontradiction,  and  the  excluded-middle.  A 
vague  and  ambiguous  language  contains,  within  itself, 
contradictory  and  incompatible  principles.  Such  a 
language  offering  the  possibility  of  various  alternatives, 
some  of  them  mutually  exclusive,  can  be  labeled 
over  determined.  In  order  to  talk  coherently,  we  must 
attempt  to  differentiate  between  incompatibles,  saying 
either  one  thing  or  the  other,  but  not  both  in  the  same 
breath.  But  if  something  contains  the  possibility  of 
being  two  or  more  distinct  things,  when  we  can  see  it 
and  say  it  as  now  one  thing,  flew,  the  other  thing.  The 
two  sccings  and  savings  are  alternatives,  and  either 
might  be  effective  in  one  context  while  the  other  is  more 
effective  in  another  context:  different  alternatives 
actualized  at  different  times  and  different  places,  all  of 
them  construed  to  be  valid  in  their  own  right,  are 
characteristic  of  an  underdetermined  system. 

Wittgenstein's  rabbit-duck  drawing  is 
ambiguous  in  terms  of  its  Secondness:  it  is  actualizes  as 
generally  either  one  thing  pi  the  other.  Within  the 
sphere  of  Firstness.  however,  it  is  vague,  bojji  the  one 
and  the  other,  though  it  remains  as  no  more  than  a 
possibility:  it  is  ovcrdetermined.  Within  the  sphere  of 
Thirdness,  on  the  other  hand,  it  contains  the  potentiality 
for  actualization  as  neither  the  one  nor  the  other  but 
something  else,  in  fact,  many  possible  other  things.  It 
can  be  imagined  as  the  map  of  an  island,  it  can  be  a 
masked  person  at  the  Mardi  Gras,  it  can  be  a  strangely 
shaped  cloud,  or  it  can  be  just  squiggles  In  this  sense  it 
is  underdetermined.  According  to  Bohr 
complementarity,  an  electron  behaves  like  a  particle 
under  one  set  of  circumstances  and  like  a  wave  under 
another  set  of  circumstances.  Expressing  the  electron  as 
now  one  thing,  now  another,  is  revelation  of  language's 
underdeterrnination.  Of  course,  it  is  from  another  view 
both  a  particle  and.  a  wave  (ovcrdetermined),  though  it 
cannot  manifest  both  of  those  characteristics  in 
simultaneity—nor  could  we  be  aware  of  both 
characteristics  at  once  even  if  they  could  be  so 
manifested  We  have  also  seen  this  use  of  language  in 
the  above  comments  by  Heisenberg,  especially  in  his 
observation  that  'the  situation  of  complementarity  is  not 
confined  to  the  atomistic  world  alone'  but  is  also  found 
in  the  arts  and  in  everyday  life  situations  [2], 
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In  other  words,  when  physicists  use  ordinary 
language,  they  are  forced  to  talk  about  electrons  and 
quarks  in  about  the  same  way  as  they  talk  about 
anything  else  And  talk,  well,  it's  just  talk.  Particles 
and  waves,  black  marks  on  white  making  up  a  book, 
squiggles  forming  a  rabbit  or  a  duck,  and  so  on,  are 
alternate  ways  of  talking  about  our  various  'semiotic' 
worlds  Ultimately,  in  this  respect,  what  the  physicist 
studies  'is  not  nature  in  itself,  but  nature  exposed  to  our 
method  of  questioning  it.  Our  scientific  work  in  physics 
consists  in  asking  questions  about  nature  m  the 
language  we  possess  and  trying  to  get  an  answer  from 
experiment  by  the  means  that  are  at  our  disposal'  [2], 
For  Bohr  himself,  it  is  wrong  to  suppose  that  the  task  of 
physics  is  that  of  saying  what  the  world  is.  More 
adequately  stated,  physics  is  concerned  with  what  the 
physicist  can  say  about  the  world.  In  response  to  the 
argument  that  the  world  precedes  and  is  more 
fundamental  than  language,  Bohr  once  quipped:  'We 
are  suspended  in  language  in  such  a  way  that  we  cannot 
say  what  is  up  and  what  is  down.  The  word  "reality"  is 
also  a  word,  a  word  wc  must  leam  to  use  correctly' [5]. 
Bohr's  suggestion  is  that  no  matter  how  we  talk  about 
the  world,  language  ends  up  saturating  it,  and  in  the 
process  we  finally  end  up  talking  about  talk  itself. 
'Stand  for',  'reference',  and  'representation',  like 
'reality',  are  also  words,  and  to  assume  our  words  'stand 
for',  'refer  to',  or  'represent'  something  beyond  them  in 
the  big,  wide  world  are  of  the  substance  dissolved 
dreams  are  made  of.  Talk  is  ultimately  related  to  talk 
and  if  it  is  also  somehow  related  to  the  world,  so  much 
the  better.  But  we  should  be  mindful  that  this 
relationship  will  more  often  than  not  come  as  a  hopeful 
byproduct  of  the  talk. 

This  is  interesting  in  light  of  Wheeler's  post- 
Einstein  'physics  of  meaning'  The  'physics  of 
meaning'  stipulates  that  without  the  scientist's  asking 
questions  of  the  universe,  talking  to  it.  observing  it, 
interacting  with  it,  it  simply  is:  it  has  no  significance 
for  physics.  Thus  the  physicist  her/himself  is  a 
participatory  semiotic  agent,  a  sign  her/himself, 
interacting  with  the  entire  range  of  possible  signs, 
initially  actualizing  them  as  signs,  and  pushing  them 
along  the  road  towards  their  fulfillment  as  genuine  signs 
with  fully  developed  interpretants.  In  fact,  the  entire 
community  to  which  the  physicist  belongs  is  involved. 
Implicit  within  the  process  of  the  physicist's  interaction 
with  the  universe  is  the  injunction,  do  such-and-such. 
which  entails  participation  with  what  there  is.  This  is 
germane  to  Bohr's  complementarity  principle.  In  both 
cases  participation  of  the  semiotic  agent  is  a  must,  and 
by  means  of  such  participation  signs  become  full-blown 
signs,  in  the  process  taking  on  meaning. 

The  story  goes  something  like  this.  A  set  of 
signs,  most  effectively  put  in  the  form  of  a  sentence,  is 


taken  as  an  assertion  that  such-and^s_uch  is  the  case.  It 
is  at  the  same  time  an  injunction,  either  implicit  or 
explicit,  to  consider  the  practical  effects  the  assertion 
might  conceivably  have  with  respect  to  the  act  of 
following  the  injunction.  For  example,  if  you  say  'My 
speakers  can  take  a  90  watt  blast'.  I  might  take  it  that  I 
am  invited  to  subject  them  to  the  full  volume  of  the 
amplifier.  If  I  do  so.  and  if  they  withstand  the  pressure, 
then  we  know.  If  they  don't,  we  still  know-and  you 
turn  on  your  word  processor,  intent  on  writing  a  letter  to 
the  manufacturer  regarding  their  warranty.  So  'My 
speakers  can  take  a  90  watt  blast'  is  not  thoroughly 
meaningful  until  we  put  it  to  the  test.  In  other  words,  in 
Wheeler's  16]  terms,  we  must  put  the  signs  (or 
phenomena)  to  use,  whether  they  be  sign-events,  in  the 
case  of  conceivable  and  actual  physical  effects,  or 
thought-signs,  in  the  case  of  fictions,  purely  hypothetical 
situations,  or  'thought-experiments1. 

This  is  comparable  to  Bohr's  theory  of  meamng 
emerging  from  his  complementarity  principle  [3].  Is  an 
electron  a  particle  or  a  wave?  Since  it  cannot  be  a 
panicle  and  a  wave  simultaneously,  one  might  surmise 
that  it  is  neither  a  particle  nor  a  wave,  or  that  it  is  bjjjh  a 
particle  and,  a  wave  In  the  first  case,  one  does  not 
commit  oneself,  but  holds  out  for  a  more  general 
concept  of  'particle'  and  'wave';  in  the  second  case,  one 
embraces  a  contradiction  at  the  expense  of  admitting  a 
cloud  of  vagueness  into  one's  purview.  For  Bohr, 
neither  case  is  adequately  meaningful:  'an  electron  as 
particle'  is  meaningful  under  certain  actualized 
conditions,  and  'an  electron  as  wave'  is  meaningful 
under  other  complementary  conditions.  The  conditions 
must  be  actualized  before  the  signs  in  question  become 
genuine  signs.  Meaning  is  a  matter  not  of  what  we  can 
know  about  the  'real  world'  but  what  we  can  do  with 
signs  in  such  a  way  that  we  can  say  more  about  the 
'semiotically  real'  objects  surrounding  us. 

In  a  Peircean  vein,  the  notion  of  bp_th  a 
'particle'  anj2  a  'wave'  exists  in  the  overdc.term.tned 
sphere  of  possibilities  (Firstness),  neither  a  'particle'  nor 
a  'wave'  at  the  same  moment  has  citizens  rights  to  the 
underdetermined  sphere  of  generalities  (Thirdness), 
ordained  by  convention,  habit,  and  an  incessant 
movement  towards  an  expansion  of  one's  conceptual 
grasp.  In  contrast,  Bohr's  fipjy.  a  'panicle',  now  a 
'wave'  (or  either  a  'particle'  or  a  'wave')  holds  true  to 
classical  logic  and  the  tenets  of  physical  existence 
(actualities.  Secondness).  It  has  to  do  with  the  things  we 
can  do  with  signs  and  the  things  they  do  to  us.  Quite 
obviously,  Firstness  and  Thirdness  break  out  of  the 
fetters  imposed  on  them  by  classical  logic,  for 
contradiction  and  excluded  middles  are  allowed,  while 
Secondness,  the  master  of  our  physical  world  as  we 
would  like  to  perceive  it  and  conceive  it,  generally 
remains  faithful  to  classical  principles  So  a  sign  can  be 
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drawn  from  a  massive  range  of  possibilities  to  become 
an  actual  sign,  which  can  then  be  meaningful  in  terms 
of  its  practical  consequences-its  interrelation  in  the 
world  of  actual  signs  But  its  meaning  will  remain 
incomplete,  for  between  the  neither  and  the  ngr  (or  the 
either  and  the  oj)  other  signs  always  stand  a  chance  of 
popping  up  to  alter  whatever  meaning  was  to  be  had. 

WHAT  MORE  CAN  WE  DO? 

The  initiation  of  semiosic  flow,  I  would  submit, 
is  graphically  depicted  in  the  central  portion  of  Figure  1, 
where  there  is  nothing  more  than  an  infinitesimally  thin 
plane  or  boundary  separating  nothing  from  anything 
else:  it  is  mere  'nothingness',  'emptiness',  no  more 
than  folds  and  warps,  topological  distortions. 


mac  ocean: 


FIGURE  1 


Yet  this  'nothingness'  or  'emptiness'  contains,  'within' 
itself,  the  sphere  of  possibilities  that  can  give  rise  to  the 
emergence  of  everything  that  is  in  a  particular 
'semiotically  real'  world.  We  could  say  of  scmiosis 
what  Wheeler  says  of  geometrodynamics. 

This  great  world  around  us~how  is  it  put 
together?   Out  of  gears  and  pinions?  By  a 
oorps  of  Swiss  watchmakers?    According  to 
some  multifaceted  master  plan  embodying  an 
all-embracing  corpus  of  laws  and  regulations? 
Or  the  direct  opposite?  Are  we  destined  to  find 
that  every  law  of  physics,  pushed  to  the  extreme 
of  experimental  test,  is  statistical— as  heat  is- 
not  mathematically  perfect  and  precise?  Is 
physics  in  the  end  'law  without  law',  the  very 
epitome  of  austerity?  [7] 
Now,  what  can  Wheeler  possibly  mean  by  'law 
without  law'?  That  the  boundary  is  where  the  action  is. 
And  what  is  the  boundary'?  Once  again,  zero,  zilch, 
nothing  at  all.  It  is  a  fold,  that  which  is  folded  within 
itself,  like  a  cavity  that  is  a  cavity  of  itself,  like  the 
origami  fold,  the  paper  of  which  is  superfluous,  for  what 
is  important  is  the  fold  itself,  a  fold  of  space  that  is  at 
the  same  time  an  enfolding  and  an  unfolding.  A 
hurricane  of  incessant  unfolding  of  the  enfolded  and 
enfolding  of  the  unfolded,    At  this  level,  there  is  no 
distinction  between  organic  and  inorganic  matter, 
between  the  living  and  the  dead.  Thus  wc  are  now  told 
by  Ilya   Prigogine,   fractal   geometers,   and  chaos 
mathematicians,  that  all  is  self-organizing  (codependent 


emergence),  like  the  self-organizing  universe  itself.  A 
geological  formation  is  enfolded  into  a  vein  of  gold, 
sheaves  and  shears  of  tectonic  plates  press  plants  into 
black  carboniferous  matter  and  ultimately  enfold  that 
into  a  diamond,  an  enfolded  chick  embryo  or  an  acorn 
unfolds  onoe.  twice,  thrice,  and  virtually  countless  times. 

But  please  don't  get  me  wrong.  The  unfolding 
of  the  enfolded  and  vice  versa  is  not  simply  a  matter  of 
tension  release  and  tension  build-up,  nor  is  it  dilation 
and  contraction,  but  rather,  it  is  evolving  and  involving, 
developing  and  enveloping,  convolution  and  involution. 
A  living  organism's  organs  are  involved,  enveloped, 
contracted  folds  that  got  that  way  after  generation  upon 
generation  of  evolution,  development,  and  dilation  and 
displaying  of  what  was  there  all  along  as  a  possibility. 
And  now.  they  are  there,  within  the  organism,  in  the 
process  incessantly  of  enveloping,  involving,  enfolding, 
implicating  matter  and  energy  from  the  outside  in  order 
that  they  may  sustain  life:  that  is  their  charge  and  their 
purpose  in  and  for  the  organism  A  cell  is  the  product  of 
one-dimensional  engenderment  according  to  a  few 
letters  and  an  algorithm  in  three-dimensional  space  to 
create  something  'here'  that  is  distinguished  from 
everything  'there1  by  a  two-dimensional  wrap,  a 
relatively  smooth  fold,  a  boundary.  The  entire  surface  of 
a  living  organism  takes  up  a  minute  piece  of  three- 
dimensional  space,  and  at  the  same  time  that  surface  is  a 
boundary  in  four-dimensional  spacetime  in  the  sense  of 
Wheeler. 

'This  won't  do  the  trick,'  someone  retorts. 
'You  must  really  be  more  explicit.'  However,  how  can 
we  say  what  there  is  in  that  'emptiness1  which  allows  of 
no  saying  at  all?  How  can  we  say,  really  say,  what  there 
is  within  the  center  of  the  Figure  1  cube  where 
everything  is  enfolded  and  nothing  unfolded? 
Impossible!  As  impossible  as  geometrodynamic  field 
equations  'saying'  how  mass-momentum  and  energy- 
get  a  grip  on  spacetime  at  the  boundary  of  the  boundary. 
It  can  be  described  through  equations  but  hardly  said 
clearly  and  distinctly  in  natural  language  [7],  You 
simply  can't  describe  the  enfolded  point,  the  center, 
which  is  the  equivalent  of  Wheeler's  zero  moment  of 
rotation.  How  can  anything  be  of  zero  rotation  in  a 
universe  in  which  all  movement  depends  upon  frames  of 
reference?  Assume  we  place  a  cube  of  ice  with  a  small 
buck-shot  frozen  in  its  center-like  Figure  l~on  a 
skating  rink  floor.  Wc  rotate  the  cube,  which  is  a  quite 
effortless  task  since  there  is  little  friction  between  ice 
cube  and  icy  surface.  The  cube  being  well-nigh 
symmetrical,  and  all  other  things  being  equal,  we  look  at 
the  buck-shot  and  observe  that  it  appears  to  be 
motionless.  As  a  virtually  infinitesimal  point  it  has  zero 
moment  of  rotation.  Everywhere  else  there  is  force 
imparted  with  a  certain  amount  of  energy,  but  this 
central  point  has  no  energy  and  it  exercises  no  force.  It 
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is  comparable  to  the  hub  of  the  Buddhist  wheel.  It  is  the 
'image'  of 'emptiness'! 

'But',  someone  retorts,  'what  we  have  here  is  a 
mere  point,  clearly  and  distinctly  'nothingness'-if  not  to 
say  "emptiness"  You  should  really  get  back  to  the  idea 
of  boundary,  which  cannot  be  explained  away  in  such 
sophomoric  fashion'.  O.K.  Assume  we  have  a 
Lmelander  gliding  happily  along  her  two-dimensional 
strip-world  one  extremity  of  which  is  given  a  twist  and 
connected  to  the  other  extremity  to  form  a  Mobius-strip. 
Easy  enough.  She  is  a  onc-dimensional  being  residing 
on  a  two-dimensional  sheet  curved  and  doubled  back  on 
itself  in  three-dimensional  space,  In  essence,  she 
inhabits  three  dimensional  space.  Now  suppose  she 
stretches  herself  out  along  the  strip  until  she  encounters 
her  hind  quarters  and  latches  onto  them.  She  has 
become  a  circle.  As  such,  she  is  finite  but  unbounded. 
Unbounded?  Of  course.  She  is  apparently  a  ring,  an 
unbounded  boundary  that  bisects  the  M8bius-strip, 
separating  it  into  two  worlds,  this  one  and  jjjai  one,  the 
one  that  is  first  'above'  and  then  'below',  and  the  other 
one  that  is  first  'below'  and  then  'above'  Now  that  is  a 
boundary  we  can  identify  with.  Is  it  not?  We  have  a 
one-dimensional  line,  or  Firstness,  that,  in  relationship 
with  its  other,  the  two-dimensional  strip,  brings  about  a 
'cut'  in  order  to  construct  a  binary  relationship  of 
Secondness.  all  by  means  of  the  mediating  capacity  of 
Thirdness  from  our  three-dimensional  view  wherein  the 
Mobius-strip  could  become  a  legitimate  M6bius-strip, 

Let  us  suppose  we  bisect  our  Lmelander  by 
taking  our  scissors  and  cutting  the  strip  in  two  pieces 
precisely  where  she  is  lying  in  apparently  so  relaxed  and 
carefree  a  position.  Now  we  should  have  two  Mttbius- 
strips  separated  by  our  erstwhile  Linelander.  To  our 
surprise,  however,  the  stnp  isn't  bisected  at  all.  It 
remains  intact.  But  it  is  definitely  not  the  same  strip. 
No,  that's  not  correct,  not  exactly.  It  is  now  half  as  wide 
and  twice  as  long  as  it  was,  and  it  sports  two  twists 
instead  of  one.  Zero  sum  game  again!  Then  it  is  the 
same  strip,  only  evincing  a  different  form.  What  did  our 
Lmelander-boundary  separate  anyway?  'Nothing'  from 
'nothing'.  The  boundary  was  a  zero  boundary.  While 
wc  were  cutting  the  strip,  there  was  a  direction,  and  time 
passed,  irreversible  time,  But  when  we  finish  we  realize 
that  there  had  been  no  line  that  separated  something 
from  something  else,  no  direction,  no  irreversibility,  and 
no  indication  of  time.  There  was  no  other,  nothing 
distinguishing  this  from  Qg&, 

Now,  essentially  the  whole  strip  is  indivisible  if 
we  cut  it  along  its  elongated  surface.  In  this  sense,  it  is, 
most  properly  speaking  an  icon,  self-contained,  self- 
referential,  unary  (it  is  chiefly  of  Firstness).  The  strip 
itself,  then,  is  indcxical  insofar  as  it  is  capable  of 
incorporating  two-dimensionality,  directionality, 
binanly  (it  is  chiefly  of  Secondness)    And  the  line  is 


symbolicity  in  the  sense  that  it  entails  one-dimensional, 
linear  eneendcrment  of  signs  (it  is  chiefly  of  Thirdness). 
However,  the  fact  remains  that  the  strip  as  icon  in  the 
pure  sense  is  indivisible.  Try  to  sever  it  with  images, 
relations,  and  words,  and  it  remains  as  it  is,  though  of  a 
slightly  to  radically  different  countenance.  Yet  virtually 
nothing  has  changed.  The  sum  of  all  changes  is  no 
change—as  the  French  say,  the  more  things  change  the 
more  they  stay  the  same. 

Now,  what  happens  if  we  apply  our  Linelander 
within  her  Flatland  to  our  own  sphere  of  existence?  We 
take  the  cube  of  Figure  1,  metaphorically  tantamount  to 
our  three-dimensional  universe,  and  we  stretch  the 
nght-hand  side  of  it  out,  give  it  a  twist,  and  bring  it  back 
to  its  left-hand  side  and  connect  it.  What  wc  have  is  a 
rectangularly  shaped  sort  of  skewed  torus.  Now  suppose 
we  split  our  angular,  twisted  torus  in  half  along  the 
shaded  plane  in  Figure  1.  We  now  know  what  the  yield 
will  be.  We  are  left  with  a  torus  half  as  thick  and  twice 
as  long  as  it  was.  Zero  change  once  again!  Total 
change  is  no  change  at  all.  The  iconicity  of  the  whole 
remains  intact.  What  are  we.  within  this  whole 
concoction,  but,  ourselves,  minuscule  twists,  tiny  warps, 
skewed  spots,  in  four-dimensional  space?--we  are  like 
the  buck-shot  in  Figure  1,  which,  if  stretched  out  when 
the  oubc  is  topologically  transformed  into  a  M6bius- 
tonis,  becomes  a  line,  a  boundary  that  separates 
'nothing'  from  'nothing'. 

So  much  for  stories  and  boundaries.  What  we 
are  left  with  is  a  flow  of  interdependent,  interrelated, 
merging,  diverging  and  converging,  signs.  Signs  all! 
Semiosis:  the  becoming  of  signs  fjgj  us  vyithin  our 
world-line. 
REFERENCES 

[1]  Baer,  Eugen  (198S).  Medical  Semiotics.  Lanham: 

University  Press  of  America. 
[2]   Heisenberg,    Werner   (1958).       Physics  and 

Philosophy.  Ann  Arbor:  Michigan  UP. 
[3]  Murdoch,  Dugald  (1987).  Niels  Bohr's  Philosophy 

of  Physics.  Cambridge.  Cambridge  UP. 
[4]  Peirce,  Charles  Sanders  (1931-35),  Collected  Papers 

of  Charles  Sanders  Pence,  C.  Hartshorne  and  P. 

Weiss  (eds.),  vols,  1-6,  Cambridge:  Harvard  UP. 
[5]  Petersen,  Aage  (1985),    'The  Philosophy  of  Niels 

Bohr.'   In  Niels  Bohr:   A  Centenary  Volume.  A. 

French  andP  Kennedy  (eds.),  299-310.  Cambridge: 

Harvard  UP, 

[6]  Wheeler,  John  Archibald  (1980).  'Law  Without 
Law  '  In  Structure  in  Science  and  Art.  P.  Medawar 
and  J.  H.  Shelley  (eds  ).  132-68)  Amsterdam: 
Excerpta  Medica. 

[7J  Wheeler,  John  Archibald  (1990).  A  Journey  into 
Gravity  and  Spacctime.  New  York:  Scientific 
American  Library. 


345 


i 


XIII  MULTIDISCIPLINARY 
APPLICATIONS  OF  SEMIOTICS 


Control  Mechanisms  for  a  Nonlinear  Model 
of  International  Relations 

Aron  Pentek,  Jim  Kadtke 

Institute  for  Pure  and  Applied  Physical  Sciences,  University  of  California  at  San  Diego,  La  Jolla,  CA  92093-0360 

Suzanne  Lenhart 

Mathematics  Department,  University  of  Tennessee,  Knoxville,  TN  37996-1300 

Vladimir  Protopopescu 

Computer  Science  and  Mathematics  Division  Oak  Ridge  National  Laboratory,  Oak  Ridge,  TN  37831-6364 


ABSTRACT 

Some  issues  of  control  in  complex  dynamical  systems  are  con- 
sidered. We  discuss  two  control  mechanisms,  namely:  a  short 
range,  reactive  control  based  on  the  chaos  control  idea  and  a 
long-term  strategic  control  based  on  an  optimal  control  algo- 
rithm. We  apply  these  control  ideas  to  simple  examples  in  a 
discrete  nonlinear  model  of  a  multi-nation  arms  race. 

KEYWORDS:  chaos  control,  optimal  control,  arms 
race 

1.  INTRODUCTION 

The  field  of  nonlinear  dynamics  offers  the  possibility  to 
describe  high  dimensional  complex  systems  -  which  hith- 
erto have  only  been  modeled  stochastically  or  heuristi- 
cally  -  in  terms  of  nonlinear  low  dimensional  determin- 
istic models.  The  appeal  of  such  models  stems  from 
two  apparently  opposing  features.  Indeed,  on  one  hand 
the  models  -  described  by  a  system  of  either  coupled 
maps  (discrete  evolution)  or  ordinary  differential  equa- 
tions (continuous  evolution)  -  are  simple  enough  to  war- 
rant a  rigorous  and  extensive  analysis  while  on  the  other 
hand  they  display  an  extremely  rich  dynamical  behavior 
that  captures  some  of  the  complexity  of  the  systems  they 
are  supposed  to  model.  Due  to  these  features,  low  dimen- 
sional nonlinear  models  have  been  extensively  applied  to 
ecologic,  neuromorphic,  societal,  and  military  systems. 
Recently  the  focus  of  these  applications  has  shifted  from 
analysis  and  exhaustive  exploration  of  the  phase  and  pa- 
rameter spaces  to  control  and  prediction  [15]. 

In  this  paper  we  discuss  the  application  of  two  concep- 
tually different  control  mechanisms  to  simple  discrete  dy- 
namical systems  of  Richardson  type  used  to  model  arms 
races.  [5,8,13].  A  short  description  of  these  models  will 
be  given  in  Section  2.  In  Section  3  we  present  our  first 
control  strategy  to  arms  races  and  international  relations, 
namely  a  short  range,  reactive  control  designed  to  react 
to  changes  in  international  relations  based  on  immediate 
feedback  utilizing  the  natural  dynamics  of  the  system.  In 


this  case  policy  changes  are  typically  planned  only  one 
(time)  step  ahead.  At  each  time  step  control  is  revised 
based  on  some  feedback  that  reflects  the  actual  state  of 
the  system  and  does  not  take  into  account  any  long  term 
strategic  thinking.  The  main  advantage  of  this  short  term 
crisis  management  is  that  typically  it  requires  only  very 
small  changes  in  the  system  parameters.  Indeed,  due  to 
the  inherently  nonlinear  and  potentially  unstable  char- 
acter of  the  dynamics,  this  control  is  based  on  a  chaos 
control  algorithm  [12-14]  and  it  is  used  to  stabilize  un- 
stable periodic  orbits  or  fixed  points  of  the  dynamics  by 
applying  only  small  perturbations  to  (one  of)  the  system 
parameters.  The  basic  idea  is  to  perturb  the  system  in 
such  way  to  drive  it  to  the  unstable  manifold  of  the  de- 
sired orbit.  On  this  manifold  the  natural  dynamics  of  the 
system  will  bring  the  trajectory  closer  and  closer  to  the 
desired  fixed  point  or  limit  cycle. 

The  second  procedure,  called  long  range,  strategic  con- 
trol reflects  a  longer  term  planning  of  policy  changes  that 
are  implemented  consistently  throughout  the  whole  evo- 
lution. The  long  term  strategic  control  is  achieved  by  an 
optimal  control  algorithm  [6]  that  we  briefly  discuss  in 
Section  4.  In  the  optimal  control  framework  the  pertur- 
bations (controls)  are  evaluated  in  order  to  minimize  or 
maximize  some  objective  functional  that  depends  on  the 
state  and  the  controls.  The  actual  form  of  this  functional 
essentially  depends  on  the  goal  we  want  to  achieve  and 
explicitly  includes  the  reward  and  cost  of  that  goal.  The 
main  advantage  of  such  planning  is  that  the  changes  one 
has  to  implement  are  known  over  the  whole  time  span  of 
the  system's  evolution  and  provide  an  optimal  solution 
(an  optimal  cost  effectiveness  is  realized  for  attaining  the 
desired  goal).  The  basic  disadvantage  of  this  method 
is  that  modeling  errors,  inherent  perturbations,  and  un- 
foreseen factors  will  usually  lead  to  a  trajectory  different 
from  the  planned  one. 

2.  MODIFIED  RICHARDSON  MODEL 

To  illustrate  the  control  algorithms  we  choose  a  very 
simple  example  related  to  the  socio-political  field.  Re- 
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cently,  Miller,  Sulcoski,  and  Farmer  have  presented  a 
modification  of  the  Richardson  model  (MRM)  describing 
an  iV-nation  arms  race,  in  the  form  of  a  discrete,  non- 
linear set  of  coupled  dynamical  equations  [8].  This  com- 
plex spatio-temporal  system  can  be  simply  represented 
by  a  coupled  map  lattice  (CML).  Specifically,  the  model 
is  given  by 


fcis  =  hi  =  0.4,  xm 


0  6  x2 


1.5,  x 
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where  xa  is  the  "total  military  capability  potential" 
(TMCP)  of  country  a  at  time  i,  xmax  and  xmin  are 
maximum /minimum  values  for  the  TCMP  sustainable 
by  the  respective  country  a,  and  kaa  and  kat,  are  cou- 
pling constants,  representing  internal  threats  and  exter- 
nal alliances,  respectively.  The  internal  instability  in- 
dex Ia  =  2kaa/(xmax  ~  xmin)  determines  whether  coun- 
try a  behaves  like  a  democratic  (Ia  >  0)  or  totalitar- 
ian (Ia  <  0)  nation  [8].  Another  interpretation  classi- 
fies these  countries  as  non-militaristic  and  militaristic, 
respectively.  For 

particular  parameter  values,  iteration  of  this  mapping 
produces  a  2D  landscape  (  1  spatial,  1  time)  whose  topol- 
ogy describes  the  qualitative  dynamics  of  the  political  sit- 
uations of  the  N  countries.  Being  globally  coupled  and 
nonlinear,  this  model  has  the  potential  for  rather  rich 
behavior,  and  so  far  there  have  identified  stationary,  pe- 
riodic, and  chaotic  parameter  regimes.  The  relevance  of 
this  model,  as  for  the  original  Richardson  model,  is  that 
it  may  have  solutions  which  yield  considerable  insight 
into  the  possible  dynamical  regimes  for  the  current 

multi-sphere  world  order,  allowing  qualitative  analysis 
and  prediction. 

Case  A)  A  four-nation  model,  consisting  of  a  weak 
democratic  nation  (1),  a  weak  totalitarian  nation  (2), 
a  stronger  totalitarian  nation  (3),  and  a  strong  demo- 
cratic nation  (4).  These  countries  form  two  alliances, 
between  the  totalitarian  and  democratic  nations,  respec- 
tively, which  are  relatively  strong.  The  parameter  values 
are:  kn  =  1.0,  k22  =  -0.1,  k33  =  -0.3,  kM  =  2.0, 
ki2  —  k2i  =  0.1,  fci3  -  &3i  =  3.0,  ku  -  hi  =  -0.1, 
k23  =  ^32  =  -0.1,  k2i  =  k42  =  0.1,  k34  =  fc43  =  0.1, 

—  1-5,  xmax   =  0-7,  Xmax   —  2.0, 
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initial  conditions  (armament  levels)  we  choose  Xq  =  0.4, 
x\  —  1.4,  Xq  =  0.6  and  x%  =  1.6.  Figure  1(a)  shows  the 
resulting  time  evolutions  of  this  system  which  indicate 
exponentially 

growing  behavior  (which  is  artificially  truncated  after 
some  time  for  illustrative  purposes). 

Case  B)  A  three-nation  model,  with  the  following  pa- 
rameter values:  k\\  =  0.1,  £22  =  —0.9, 

k33  =  0.3,  k12  =  k21  =  0.4,  k23  =  k32  =  0.7, 


0.7,  xlmin  =  0.2,  x2min  =  0.3,  x3min  =  0.5.  As  initial 
conditions  (armament  levels)  we  choose  Xq  =  0.3,  Xq  = 
1.1,  and  Xq  =  0.4.  This  corresponds  to  a  system  with 
two  weak  democratic  nations  (1  and  3)  and  a  stronger 
totalitarian  nation  (2)  which  is  destabilized  by  an  internal 
security  threat.  The  resulting  complex  time  evolution  of 
this  system  is  shown  in  Fig.  2(a),  which  indicates  chaotic 
evolution  of  the  totalitarian  nation's  TMCP. 

The  instabilities  in  the  MRM  leading  to  fast  oscil- 
lations or  chaotic  evolution  of  the  TMCP  may  be  in- 
terpreted as  a  transition  to  war  or  insurgency  [7]  (c.f. 
Figs.  1(a), 2 (a)).  The  basic  question  we  try  to  answer  in 
the  next  Section  is  whether  or  not  such  instabilities  aris- 
ing at  some  stage  of  the  dynamics  of  the  interaction  of 
the  nations  can  be  controlled,  by  actively  changing  the 
international  or  internal  relations,  and  thus  avoiding  the 
onset  of  "war".  The  question  is  difficult  and  here  we 
only  address  some  simple  examples.  We  will  examine 
the  MRM  dynamics  and  sketch  some  conclusions  about 
controllability,  as  follows  from  the  MRM  (1). 

For  this  discussion,  we  assume  that  the  model  equa- 
tions and  most  estimated  coefficients  are  known,  and  the 
only  parameters  that  can  be  effectively  changed  are  those 
related  with  internal  affairs  (A;aa)  and  international  rela- 
tions (kab).  Interpreted  in  the  context  of  international 
relations,  we  assume  that  a  given  government  can  at- 
tempt to  strengthen  or  weaken  its  military  potential  by 
changing  the  internal  relations  within  its  own  country,  or 
by  carefully  tailored  international  alliances  with  any  of 
the  other  countries.  The  other  parameters  of  Eq.  (1),  the 
maximum  and  minimum  sustainable  capability  £max  and 
xmin,  are  assumed  difficult  to  change  on  short  time-scales 
as  they  reflect  economic  structure,  culture,  demograph- 
ics, technological  development,  governmental  structure, 
etc. 


3.  SHORT  RANGE  REACTIVE 
CONTROL 

Recently  nonlinear  dynamics  has  provided  us  with  a  se- 
ries of  methods  designed  to  control  unstable  saddle  points 
of  complex  systems,  with  only  small 

changes  in  one  or  a  few  of  the  experimentally  accessi- 
ble parameters.  These  methods  assume  an  active  mon- 
itoring of  the  system  and  evaluation  of  the  new  control 
parameters  at  each  time  step  based  on  the  actual  state 
of  the  system  (active  feedback,  "closed-loop  control"). 
Although  here  we  assume  perfect  knowledge  of  the  gov- 
erning equations,  in  practical  cases  this  is  neither  realistic 
nor  necessary,  as  most  of  the  information  needed  to  con- 
trol the  system  can  be  extracted  by  observing  the  evolu- 
tion of  the  system  and  the  effect  of  parameter  variations 
(c.f.  Ref.  [12,13]).  We  have  to  emphasize  that  such  an 
approach  to  stabilize  international  relations  may  not  be 
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regarded  as  a  long-term  solution,  but  rather  as  a  short 
term  crisis  management. 

Here  we  will  consider  the  method  of  Ott,  Grebogi  and 
Yorke  [12],  which  is  one  of  the  most  widely  accepted  chaos 
control  algorithms,  and  which  has  already  been  applied 
to  a  variety  of  physical  systems,  such  as  magneto-elastic 
ribbon,  chaotic  laser,  semiconductor  devices,  chemical  re- 
actions etc.  The  method  applies  only  small  perturbations 
to  drive  the  system  to  the  curve  in  the  phase  space  along 
which  the  saddle  point  can  be  exactly  reached,  called  the 
stable  manifold.  Thus  it  takes  advantage  of  the  natu- 
rally attractive  dynamics  along  the  stable  manifold.  The 
reader  is  referred  to  Ref.  [12,13]  for  a  detailed  description 
of  the  method  as  well  as  to  Ref.  [14]  for  an  extension  to 
higher  dimensional  systems. 
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FIG.  1.  Time  evolution  of  the  TMCP  for  the  uncontrolled 
(a)  and  controlled  (b)  case.  Here  the  short  range  reactive  con- 
trol have  been  employed.  Fig.  (c)  shows  the  time  evolution  of 
the  control  parameter.  Note  that  the  except  the  first  timestep 
the  applied  perturbation  is  very  small. 

Our  main  conclusions  about  controllability  can  be 
summarized  as  follows:  a)  although  in  the  mathemati- 
cal model  the  control  can  be  sustained  for  arbitrary  long 
time  with  only  small  perturbations,  in  practical  cases  this 
procedure  may  fail  after  some  time  as  the  model  may  be- 
come invalid;  b)  with  the  exception  a  few  special  cases  it 
is  not  possible  to  control  nations  at  their  maximum  ca- 
pability potential;  c)  when  the  nations  are  strongly  cou- 
pled and  have  nontrivial  fixed  points  one  has  to  employ  a 
higher  dimensional  control  algorithm,  and  the  controlla- 
bility condition  should  be  also  tested  in  a  higher  dimen- 
sional framework  as  presented  in  Ref.  [14]. 

This  control  crisis  management  is  most  effective  when 
at  least  two  of  the  nations  have  saddle  points  that  dif- 
fer from  their  maximum  sustainable  potential.  Such  an 
example  is  provided  by  Case  A.  This  model  results  in  a 
fixed  point  at  r*  =  (0.6422,1.5,0.7,0.5779).  This  shows 
the  presence  of  two  strongly  democratic  nations,  one  of 
them  at  the  edge  of  internal  stability  (J4  =  1.5),  (simi- 
lar behavior  is  observed  at  slightly  lower  I4  values).  The 
strongly  democratic  nation  has  a  large  maximum  sus- 
tainable potential  (x^aa.  =  2.0)  but  has  a  low  armament 
level.  Figure  1(a)  shows  the  time  evolution  with  initial 
TMCP  values  x\  =  0.5,  x20  =  1.5,  x%  =  0.7  and  x%  =  0.6. 
One  can  observe  that  while  the  weak  democratic  and  the 
totalitarian  nations  quickly  reach  a  steady  TMCP  level, 
the  strong  democratic  nation's  TMCP  oscillates  periodi- 
cally after  some  chaotic  transients.  Our  strategy  is  now 
to  actively  change  one  of  the  coupling  coefficients  be- 
tween the  two  democratic  nations.  Figure  1(b)  shows  the 
time  evolution  during  the  control  procedure  and  Fig.  1(c) 
the  evolution  of  the  corresponding  coupling  parameter 


4.  LONG  TERM  STRATEGIC 
CONTROL 

This  approach  is  based  on  the  optimal  control  method 
applied  to  discrete  systems  [6].  The  optimal  control 
method  can  be  simply  summarized  as  follows.  The  sys- 
tem is  described  by  a  scalar  or  vector  state  function  de- 
pending on  a  number  of  independent  variables  that  take 
values  in  the  phase  space  of  the  system.  The  state  sat- 
isfies a  (usually  nonlinear)  dynamical  scalar  or  vector 
equation  that  depends  also  on  some  parameters  that  take 
values  in  the  parameter  space.  One  or  more  of  these  pa- 
rameters are  considered  external  controls  to  be  adjusted 
at  will.  The  goal  is  to  optimize  a  given  objective  func- 
tional that  depends  on  the  state  and  on  the  control  (s). 
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Prom  the  state  equation  and  objective  functional  one  con- 
structs, in  a  canonical  way,  a  non-homogeneous  adjoint 
equation  for  a  (scalar  or  vector)  adjoint  variable.  The 
original  state  equation  together  with  the  adjoint  equation 
form  the  optimality  system  (OS).  The  optimality  condi- 
tion yields  an  explicit,  analytical  formula  for  the  optimal 
control  in  terms  of  the  solutions  of  the  OS.  By  replacing 
this  expression  of  the  control  in  the  OS  and  solving  it, 
one  obtains  the  optimal  state  and  adjoint  variable  and 
therefore  the  optimal  control.  The  general  framework 
for  optimal  control  for  general  nonlinear  systems  as  de- 
veloped by  J.-L.  Lions  [9]  was  recently  applied  to  com- 
petitive systems  of  social  and  military  interest  [10,11]. 

As  an  example  for  this  control  strategy  we  chose  the 
three-nation  model  of  Case  B  described  in  Section  2.  The 
overall  goal  is  to  minimize  the  oscillations  of  the  totali- 
tarian nation's  TMCP  by  changing  his  internal  stability 
coefficient  (£22)-  Also  we  want  to  achieve  this  goal  at 
an  optimal  cost,  and  by  simultaneously  keeping  the  the 
democratic  nations  stable.  The  corresponding  choice  of 
the  cost  function  is 
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FIG.  2.  Time  evolution  of  the  TMCP  for  the  uncontrolled 
(a)  and  controlled  (b)  case.  The  long  range  strategic  control 
is  applied  to  minimize  the  oscillations  of  the  totalitarian  na- 
tion's TMCP  (*)  at  a  minimal  cost.  The  parameters  used  are 
a\  =  0.2,  a,2  —  1.0,a3  =  0.2  and  7  —  0.01.  Fig.  (c)  shows  the 
time  evolution  of  the  control  parameter. 


G=Ei=A  01  (*i  -  *!-i)2  +  «a  (*?  -  A-xf  + 

a3  (xf  -4-i)2  -7*Mt<] 


(2) 


where  01,02,03  and  7  are  positive  constants  weighting 
the  relative  importance  of  the  "goals"  and  the  "cost", 
respectively.  In  this  example  we  choose  a  N  —  3  time 
step  planning,  an  correspondingly  the  control  will  be  also 
projected  three  time  steps  ahead  reflecting  a  longer  term 
strategical  objective.  Fig.  2(a)  shows  the  uncontrolled 
trajectory,  while  Fig.  2(b,c)  the  controlled  version  and 
the  variation  of  the  control  parameter,  respectively.  Ac- 


cording to  the  choice  of  the  cost  function  the  oscillations 
in  the  totalitarian  nation's  TMCP  have  been  minimized 
at  optimal  cost. 


5.  CONCLUSIONS 

In  principle,  each  of  the  above  control  methods  can  be 
used  to  achieve  either  stabilization  or  destabilization  of 
the  dynamics  of  MRM.  The  choice  between  the  two  meth- 
ods is  not  merely  a  political  decision.  In  some  cases  one 
or  another  method  may  be  more  efficient,  or  one  of  these 
methods  may  not  be  suitable  at  all.  One  simple  example 
is  the  problem  of  destabilizing  stable  fixed  points  of  the 
dynamics.  If  the  fixed  point  is  elliptic,  the  short  range  re- 
active control  does  not  work  because  small  perturbations 
applied  to  the  system  are  unable  to  perturb  the  trajectory 
from  the  elliptic  point's  neighborhood.  In  general,  short 
range  reactive  control  is  better  suited  when  the  model  of 
the  system  is  incomplete  and  cannot  take  into  account 
all  relevant  factors.  Hence  it  is  a  good  choice  when  one 
does  not  have  a  well-defined  overall  goal  or  strategy  and 
in  unforeseen  crisis  situations.  The  long  range  control 
is  better  suited  when  one  has  a  good  idea  about  global 
strategic  objectives,  the  costs  involved  and,  one  wants  to 
achieve  a  well-defined  goal.  Since  the  role  of  the  user  is 
essential  in  deciding  what  control  strategy  to  use  we  see 
here  an  excellent  example  of  applied  semiotic  analysis  to 
international  relations. 
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GALEN  IN  MEDICAL  SEMIOTICS 

Thomas  A.  Sebeok 
Indiana  University 


Ullmann  (1951:  161)  distinguished  among  four 
juxtaposed  branches  of  word-study:  "(1)  the  science 
of  names  (lexicology  if  synchronistic,  etymology  if 
diachronistic);  (2)  the  science  of  meanings 
(semantics);  (3)  the  science  of  designations 
(onomasiology);  (4)  the  science  of  concepts 
(Begriffslehre)."  Although  the  distinction  between 
designation  and  meaning,  particularly  as  displayed  in 
the  works  of  German  and  Swiss  semanticists  (of  the 
sometimes  loosely,  as  well  erroneously,  called  Trier- 
Weisgerber  School)  is  far  from  consistently  drawn  or 
pellucid,  I  take  it  that  this  alterity  depends  on 
whether  one's  starting  point  is  the  name,  the  lexeme, 
or,  more  generally,  the  sign;  or  whether  it  is  the 
concept  or,  more  generally  yet,  the  object,  i.e.  the 
constellation  of  properties  and  relations  the  sign 
stands  for.  If  the  former,  the  analysis  should  yield  a 
semiotic  network  responsive  to  the  question:  what 
does  a  given  sign  signify  in  contrast  and  opposition 
to  any  other  sign  within  the  same  system  of  signs?  If 
the  latter,  the  analysis  should  reveal  the  sign  by 
which  a  given  entity  is  designated  within  a  certain 
semiotic  system.  According  to  Ullmann,  the  second 
inquiry  "is  the  cornerstone  of  Weisgerber's  structure" 
(1951),  but  I  believe  that  the  two  questions  are 
indissolubly  complementary.  In  any  case,  the  whole 
enterprise  critically  hinges  upon  how  the  investigator 
parses  the  sign/object  {aliquiaValiquo)  relation,  and 
what  the  conjunctive  stands  for,  in  the  judgment  of 
the  investigator,  entails. 

The  probe  becomes  at  once  more  intricate,  but 
also  more  intriguing,  when  the  lexical  field 
(Bedeutungsfeld?  Sinnfeld?  Wortfeld?)  being 
explored  happens  to  be  reflexive,  that  is,  self- 
searching.  Such  is  the  ease  of  symptom  (Sebeok 
1986:45-58),  an  ancient  technical  term  in  both 
semiotics  and  medicine.  Thus  its  examination  may 
begin  in  the  inner  realm  of  the  lexicon,  if  viewed  as 
a  name,  or  in  the  outer  realm  of  clinical  experience, 
if  viewed  as  sense. 

One  may  properly  inquire:  what  does  the  lexeme 
symptom  mean  in  language  L\;  or  what  does  the 

same  lexeme  designate,  or  reveal  as  a  diagnostic 
intimation,  with  respect  to,  say,  an  actual  quality  of 
"diseasehood"  (Fabrega  1974:123),  that  F.G. 
Crookshank  (in  Ogden  and  Richards  1938:343) 
foresightedly  portrayed  as  "a  mysterious  substantia 
that  has  'biological  properties'  and  'produces' 
symptoms"?  In  the  end,  the  results  of  such 
dichotomous  inquiries  amalgamate  in  a  common 


dialectical  synthesis.  For  the  purposes  of  this 
exposition,  Li  is  American  English.  However,  the 

semantic  field  of  "medical  discourse,"  which  is 
typically  nested  within  wider  sets  of  concentric 
frames  (Labov  and  Fanshel  1977:36f.),  is  here 
assumed  to  be,  mutatis  mutandis,  very  similar  to  that 
in  every  other  speech  community  committed  to  the 
paradigm  of  medical  theory  and  practice  "in  the 
context  of  the  great  tradition"  (Miller  1978:184)  of 
thinking  marked  by  a  continuity  that  links  modern 
clinicians  with  the  idea  of  isonomia  launched  by  the 
brilliant  Alcmaeon  of  Croton  during  the  first  half  of 
the  fifth  century.  This  heritage  was  further 
consolidated  by  Hippocrates  —  arguably  considered, 
at  one  and  the  same  time,  the  "father  of  medicine" 
(Heidel  1 94 1  :xiii),  and  the  "der  Vater  und  Meister 
aller  Semiotik"  (Kleinpaul  1972:103)  --  and  Plato, 
Aristotle,  and  the  Alexandrian  physicians  of  the 
fourth  century  B.C.  (Manetti  1987:57-134).  Equally 
perceptive  studies  of  symptom  have,  in  fact,  cropped 
up  in  the  semiotic  literature  (e.g.,  Baer  1982,  1986, 
1988)  and  in  the  medical  literature  (e.g.,  Prodi 
1981),  undertaken  by  savants  who  mutually  know 
their  way  around  the  other  field  as  well  as  their  own 
(see  also  Staiano  1979,  1982).  One  should,  however, 
continue  to  be  ever  mindful  of  the  admonition  of 
Mounin  (1981)  against  a  mechanical  application  of 
semiotic  (especially  linguistic)  concepts  to  medicine 
(especially  psychiatry). 

Symptom  always  appears  in  conjunction  with 
sign,  but  the  precise  nature  of  the  vinculum  is  far 
from  obvious  (as  in  MacBryde  and  Blacklow  1970 
or  Chamberlain  and  Ogilvie  1974).  The  basic 
semiosic  facts  were  perspicuously  depicted  by 
Ogden  and  Richards:  "If  we  stand  in  the 
neighbourhood  of  a  crossroad  and  observe  a 
pedestrian  confronted  by  a  notice  To  Grandchester 
displayed  on  a  post,  we  commonly  distinguish  three 
important  factors  in  the  situation.  There  is,  we  are 
sure,  (1)  a  Sign  which  (2)  refers  to  a  Place  and  (3)  is 
being  interpreted  by  a  person.  All  situations  in  which 
Signs  are  considered  are  similar  to  this.  A  doctor 
noting  that  his  patient  has  a  temperature  and  so  forth 
is  said  to  diagnose  his  disease  as  influenza.  If  we  talk 
like  this  we  do  not  make  it  clear  that  signs  are  here 
also  involved.  Even  when  we  speak  of  symptoms  we 
often  do  not  think  of  these  as  closely  related  to  other 
groups  of  signs.  But  if  we  say  that  the  doctor 
interprets  the  temperature,  etc.  as  a  Sign  of 
influenza,  we  are  at  any  rate  on  the  way  to  an  inquiry 
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as  to  whether  there  is  anything  in  common  between 
the  manner  in  which  the  pedestrian  treated  the  object 
at  the  crossroad  and  that  in  which  the  doctor  treated 
his  thermometer  and  the  flushed  countenance" 
(1938:  21). 

The  relation  of  sign  to  symptom  involves  either 
coordination  or  subordination.  If  the  distinction  is 
between  coordinates,  what  matters  less  than  their 
inherent  meaning  is  the  mere  fact  of  the  binary 
opposition  between  the  paired  coordinate  categories. 
This  was  nicely  brought  to  the  fore  in  a  report  of  an 
investigation  of  the  symptom  "fatigue"  by  two 
physicians,  Shands  and  Finesinger:  "The  close  study 
of.. .patients  made  it  imperative  to  differentiate 
carefully  between  'fatigue,'  a  feeling,  and 
'impairment,'  an  observable  decrement  in 
performance  following  protracted  effort.  The 
distinction  comes  to  be  that  between  a  symptom  and 
a  sign.  The  symptom  is  felt,  the  sign  observed  by 
some  other  person.  These  two  terms  cover  the  broad 
field  of  semiotics;  they  are  often  confused,  and  the 
terms  interchanged  [at  least  in  L]  without  warning" 

(Shands  1970:52). 

This  passage  underscores  the  importance  of 
separating  the  "private  world"  of  introspection 
reported  by  the  verbal  description  or  nonverbal 
exhibition  of  the  symptoms  on  the  part  of  the  patient 
from  the  public  world  of  signs  reported  by  the 
description  of  status  or  behavior  observed  on  the  part 
of  the  physician.  As  I  had  written  earlier:  "It  is  a 
peculiarity  of  symptoms  that  their  denotata  are 
generally  different  from  the  addresser,  i.e.  the  patient 
('subjective  symptoms,'  confusingly  called  by  many 
medical  practitioners  'signs')  and  the  addressee,  i.e. 
the  examining  physician  ('objective  symptoms,'  or 
simply  'symptoms')"  (Sebeok  1976:181).  Note  that 
only  a  single  observer  —  to  wit,  oneself  —  can  relate 
symptomatic  phenomena  or  events,  whereas  an 
indefinite  number  of  observers  —  including  oneself  — 
can  observe  signs.  Accordingly,  within  this 
framework  the  fact  of  privacy  looms  as  a  criterial 
distinctive  feature  that  demarcates  any  symptom 
from  any  sign  (Sebeok  1991:36-48).  Symptoms 
could  thus  be  read  as  recondite  communiques  about 
an  individual's  inner  world,  an  interpretation  that 
sometimes  acquires  the  status  of  an  elaborate  occult 
metaphor.  For  instance,  the  eating  disorder  anorexia 
nervosa  would  appear  to  be  reasonably  decipherable 
as  "I  am  starving  (emotionally)  to  death."  Its 
symptoms  are  sometimes  believed  to  result  from 
disturbed  family  relationships  and  interpersonal 
difficulties  (Liebman,  Minuchin,  and  Baker  1974a, 
1974b).  One  palpable  sign  of  this  ailment  is,  of 
course,  weight  phobia,  measurable  as  a  decrement  in 
the  patient's  mass. 

The  crucial  distinction  between  fatigue  and 
impairment  is  "similar  to  that  between  anxiety  as  a 
felt  symptom  and  behavioral  disintegration  often 


exhibited  in  states  of  panic.  The  latter  is  a  sign,  not  a 
symptom"  (Shands  1970).  The  dissemblance 
exemplified  here  is  obviously  related  to  Uexkiill's 
(1982:209)  notion,  maintained  both  in  the  life  and 
the  sign  science,  of  "inside"  and  "outside."  I  take  the 
pivotal  implication  of  this  to  be  as  follows: 
"Something  observed  (=  outside)  stands  for 
something  that  is  (hypothetically)  noticed  by  the 
observed  subjects  (=  inside).  Or  something  within 
the  observing  system  stands  for  something  within 
the  observed  system"  {ibid.).  For  any 
communication,  this  complementary  relationship  is 
obligatory,  because  the  organism  and  its  Umwelt 
together  constitute  a  single  system.  The  shift  from 
physiological  process  to  semiosis  is  a  consequence 
of  the  fact  that  the  observer  assumes  a  hypothetical 
stance  within  the  observed  system 
{Bedeutungserteilung  /  Bedeutungsverwertung). 

For  symptom  (in  L\),  there  exists  an  array  of 
both  stricter  and  looser  synonyms.  Among  the 
former,  which  appear  to  be  more  or  less  commonly 
employed,  Elstein  et  al.  solely  but  extensively  use 
cue.  Although  they  do  so  without  a  definition,  their 
import  is  made  quite  clear  from  passages  such  as 
"cues  were  interpreted  by  physicians  as  tending  to 
confirm  or  disconfirm  a  hypothesis,  or  as 
noncontributory"  (1976:279).  Fabrega,  on  the  other 
hand,  seems  to  prefer  indicator,  but  he  uses  this 
commutably  for  either  symptom  or  sign;  and  when 
he  remarks  that  "all  indicators  may  be  needed  in 
order  to  make  judgments  about  disease"  (1974:126), 
he  surely  refers  to  both  categories  together.  The 
word  clue,  on  the  other  hand,  is  a  looser  synonym 
for  symptom:  generally  speaking,  where  symptom  is 
used  in  medical  discourse,  clue  is  found  in  the 
detectival  sphere  (Sebeok  1981a,  Eco  and  Sebeok 
1983). 

In  the  minimalist  coupling,  sign  /  symptom  are 
equipollent;  both  are  unmarked  vis-a-vis  one  another 
(Waugh  1982).  Sometimes,  however,  symptom 
encompasses  both  "the  objective  sign  and  the 
subjective  sign"  (cf.  Staiano  1982:332).  In  another 
tradition,  symptom  is  a  mere  phenomenon  "qui 
precisement  n'a  encore  rien  de  se  miologique,  de 
semantique,"  or  is  considered  falling,  for  instance,  in 
the  terminology  of  glossematics,  in  the  area  of 
content  articulation,  la  substance  du  signifiant,  an 
operationally  designated  figura  that  is  elevated  to 
full  semiotic  status  only  through  the  organizing 
consciousness  of  the  physician,  achieved  through  the 
mediation  of  language  (Barthes  1972:38f). 
However,  still  other  radically  different  sorts  of 
arrangements  occur  in  the  literature.  In  Biihler's 
organon  model  (see  Sebeok  1981b),  symptom 
constitutes  but  one  of  three  "variable  moments" 
capable  of  rising  "in  three  different  ways  to  the  rank 
of  a  sign."  These  include  signal,  symbol,  as  well  as 
symptom.  Biihler  (1965:28)  specifies  further  that  the 


355 


semantic  relation  of  the  latter  functions  "by  reason  of 
its  dependence  on  the  sender,  whose  interiority  it 
expresses."  He  clearly  subordinates  this  trio  of  words 
under  one  and  the  same  "Oberbegriff  'Zeichen',"  then 
goes  on  to  ask:  "1st  es  zwakmassig,  die  Symbole, 
Symptome,  Signale  zusammenzufassen  in  einem 
genus-proximum  'Zeichen'?"  It  should  also  be  noted 
that  Buhler's  first  mention  of  symptom  is 
immediately  followed  by  a  parenthetic  set  of 
presumed  synonyms:  "(Anzeichen,  Indicium)." 
(Note  that  the  German  verb  anzeigen  bears  an 
ominous  secondary  judicial  connotation,  "to 
denounce  somebody"  [see  Sebeok  1981b:228f.) 

Thus,  in  acknowledging  the  importance  of  the 
notion  of  privacy  as  an  essential  unmarked  feature  of 
symptom,  Biihler  also  recognizes  that,  while  it  is 
coordinate  with  two  other  terms,  it  is  also 
subordinate  to  the  (unmarked)  generic  notion  of 
sign,  namely,  that  kind  of  sign  that  Peirce  earlier,  but 
unbeknownst  to  Biihler,  defined  with  much  more 
exactitude  as  an  index  (Sebeok  1991:128-143). 

Despite  his  extensive  knowledge  of  medicine 
(Sebeok  1981a:37),  Peirce  did  not  often  discuss 
symptom  (nor  anywhere,  in  any  fecund  way, 
syndrome,  diagnosis,  prognosis,  or  the  like).  For 
him,  a  symptom,  to  begin  with,  was  one  kind  of  sign. 
In  a  very  interesting  passage,  from  the  dictionary 
lemma  "Represent,"  he  expands:  "to  stand  for,  that 
is,  to  be  in  such  a  relation  to  another  that  for  certain 
purposes  it  is  treated  by  some  mind  as  if  it  were  that 
other.  Thus,  a  spokesman,  deputy,  attorney,  agent, 
vicar,  diagram,  symptom,  counter,  description, 
concept,  premise,  testimony,  all  represent  something 
else,  in  their  several  ways,  to  minds  who  consider 
them  in  that  way"  (2:273). 

For  Peirce,  however,  a  symptom  was  never  a 
distinct  species  of  sign,  but  a  mere  subspecies, 
namely  the  index  —  or  secondness  of  genuine  degree 
(in  contrast  to  a  demonstrative  pronoun,  exempli- 
fying secondness  of  a  degenerative  nature)  —  of  one 
of  his  three  canonical  categories.  But  what  kind  of 
sign  is  this?  Peirce  gives  an  example  that  I  would 
have  preferred  to  label  a  clue:  "Such,  for  instance,  is 
a  piece  of  mould  with  a  bullet-hole  in  it  as  sign  of  a 
shot;  for  without  the  shot  there  would  not  have  been 
a  hole;  but  there  is  a  hole  there,  whether  anybody  has 
the  sense  to  attribute  it  to  a  shot  or  not"  (2.304).  The 
essential  point  here  is  that  the  indexical  character  of 
the  sign  would  not  be  voided  if  there  were  no 
explicit  interpretant,  but  only  if  its  object  were 
removed.  An  index  is  that  kind  of  a  sign  that 
becomes  such  by  virtue  of  being  really  (i.e. 
factually)  connected  with  its  object.  "Such  is  a 
symptom  of  disease"  (8.119).  All  "symptoms  of 
disease,"  furthermore,  "have  no  utterer,"  as  is  also 
the  case  with  "signs  of  the  weather"  (8.185).  We 
have  an  index,  Peirce  prescribed  in  1885,  when  there 
is  "a  direct  dual  relation  of  the  sign  to  its  object 


independent  of  the  mind  using  the  sign...  Of  this 
nature  are  all  natural  signs  and  physical  symptoms" 
(3.361). 

A  further  detail  worth  pointing  out  is  that  Peirce 
calls  the  "occurrence  of  a  symptom  of  a  disease. ..a 
legisign,  a  general  type  of  a  definite  character,"  but 
"the  occurrence  in  a  particular  case  is  a  sinsign" 
(8:335),  that  is  to  say,  a  token.  A  somewhat  cryptic 
remark  reinforces  this:  "To  a  sign  which  gives 
reason  to  think  that  something  is  true,  I  prefer  to  give 
the  name  of  a  symbol;  although  the  words  token  and 
symptom  likewise  recommend  themselves"  (MS  787, 
1896).  Staiano  is  undoubtedly  correct  in  remarking 
that  "the  appearance  of  a  symptom  in  an  individual 
is  thus  an  indexical  sinsign,  while  the  symptom 
interpreted  apart  from  its  manifestation  becomes  an 
indexical  legisign"  (1982:331). 

Symptoms,  in  Peirce's  usage,  are  thus  unwitting 
indexes,  interpretable  by  their  receivers  without  the 
actuality  of  any  intentional  sender.  Jakobson 
likewise  includes  symptoms  within  the  scope  of 
semiotics,  but  cautions  that  "we  must  consistently 
take  into  account  the  decisive  difference  between 
communication  which  implies  a  real  or  alleged 
addresser  and  information  whose  source  cannot  be 
viewed  an  addresser  by  the  interpreter  of  the 
indications  obtained"  (1971:703).  This  remark 
glosses  over  the  fact  that  symptoms  are  promptings 
of  the  body  crying  out  for  an  explanation  —  for  the 
construction,  by  the  self,  of  a  coherent  and 
intelligible  pattern  (which  of  course  may  or  may  not 
be  accurate;  cf.  Polunin  1977:91). 

Pain  comprises  one  such  symptom  that 
embodies  a  message  compelling  the  central  nervous 
system  to  influence  both  covert  and  overt  behavior 
to  seek  out  signs  of  pain,  throughout  phylogeny, 
ontogeny  hie  et  ubique.  Miller  befittingly  expands: 
"From  the  instant  when  someone  first  recognizes  his 
symptoms  to  the  moment  when  he  eventually 
complains  about  them,  there  is  always  an  interval, 
longer  or  shorter  as  the  case  may  be,  when  he  argues 
with  himself  about  whether  it  is  worth  making  a 
complaint  known  to  an  expert....  At  one  time  or 
another  we  have  all  been  irked  by  aches  and  pains. 
We  have  probably  noticed  alterations  in  weight, 
complexion  and  bodily  function,  changes  in  power, 
capability  and  will,  unaccountable  shifts  of  mood. 
But  on  the  whole  we  treat  these  like  changes  in  the 
weather....  (1978:  45-49). 

Peirce  once  particularized  the  footprint  that 
Robinson  Crusoe  found  in  the  sand  to  be  an  index 
"that  some  creature  was  on  his  island"  (4:351),  and 
indeed  an  index  always  performs  as  a  sign  the 
vectorial  direction  of  which  is  toward  the  past,  or,  as 
Thorn  put  it,  "par  reversion  de  la  causalite 
generatrice"  (1980:194),  which  is  the  inverse  of 
physical  causality.  Augustine's  class  of  signa 
naturali,  defined  —  in  contrast  to  signa  data  —  by  the 
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relation  of  dependence  between  sign  and  the  things 
signified  (De  Doctrina  Christiana  II.  1.2),  beside  its 
orthodox  sense  (such  as  a  rash  as  a  symptom  of 
measles),  is  also  illustrated  by  footprints  left  by  an 
animal  passed  out  of  sight.  It  may  thus  be  regarded 
as  encompassing  a  portent,  or  in  the  most  general 
usage,  evidence  (for  instance,  as  a  south-westerly 
wind  may  both  signify  and  bring  rain,  i.e.,  give  rise 
to  its  significatum).  Thus  symptoms,  in  many 
respects,  function  like  tracks  --  footprints, 
toothmarks,  food  pellets,  droppings  and  urine,  paths 
and  runs,  snapped  twigs,  lairs,  the  remains  of  meals, 
etc.  --  throughout  the  animal  world  (Sebeok 
1976:133),  and  in  hunting  populations,  where 
humans  "learnt  to  sniff,  to  observe,  to  give  meaning 
and  context  to  the  slightest  trace"  (Ginzburg  1983). 
Tracks,  including  notably  symptoms,  operate  like 
metonyms.  This  trope  is  also  involved  in  pars  pro 
toto,  as  extensively  analyzed  by  Bilz,  who  spelled 
out  its  relevance:  "Auch  eine  Reihe  korperlicher 
Krankheitszeichen  sog.,  funktioneller  oder  organ- 
neurotischer  Symptome,  haben  wir  unter  den 
Generalnenner  der  Szene  gebracht,  einer 
verschutteten  Ganzheit....  Hier  ist  es...eine 
Teil  funktion  der  Exekutivc.wobei  wir  abermals 
auf  den  Begriff  der  Parsprototo  stiessen"  (1940:287). 

It  is,  of  course,  Hippocrates  who  remains  the 
emblematic  ancestral  figure  of  semiotics  —  that  is  to 
say,  semiology,  in  the  narrow,  particularly  Romance, 
sense  of  symptomatology  ~  although  he  "took  the 
notion  of  clue  from  the  physicians  who  came  before 
him"  (Eco  1980:277).  Baer  alludes  to  a  "romantic 
symptomatology,"  which  he  postulates  may  have 
been  "the  original  one,"  carrying  the  field  back  "to 
an  era  of  mythical  consciousness"  (1982:18). 
Alcmaeon  remarked,  in  one  of  the  scanty  fragments 
of  his  book:  "As  to  things  invisible  and  things 
mortal,  the  gods  have  certainties;  but,  so  far  as  men 
may  infer...,"  or,  in  an  alternative  translation,  "men 
must  proceed  by  clues..."  (Eco  1980:281),  namely, 
provisionally  conjecture.  And  what  is  to  be  the  basis 
of  such  circumstantial  inference?  Clearly,  the 
concept  that  has  always  been  central  is  symptom 
(semeion;  Ginzburg  1983). 

While  Alcmaeon  is  commonly  regarded  as  the 
founder  of  empirical  psychology,  it  was  Hippocrates, 
a  clinical  teacher  par  excellence  (Temkin  1973),  who 
broke  with  archaic  medical  practice  where  the 
physician  was  typically  preoccupied  with  the  nature 
of  the  disease,  its  causes  and  manifestations, 
refocusing  directly  upon  the  sick  person  and  his/her 
complaints  —in  brief,  upon  the  symptoms  of  disease: 
"Nicht  so  sehr  die  Krankheit  als  das  Kranke 
Individuum"  (Neuburger  1906:196). 

For  Hippocrates  and  his  followers  symptoms 
were  simply  "significant  phenomena"  (cf.  Heidel 
1941:62).  Their  consideration  of  symptoms  as 
natural  signs  —  those  having  the  power  to  signify  the 


same  things  in  all  times  and  places  —  was  of  the 
most  comprehensive  sort.  An  early  discussion  of  this 
type  is  found  in  Hippocrates'  Prognostic  (XXV): 
"One  must  clearly  realize  about  sure  signs,  and  about 
symptoms  generally  [peri  ton  tekmerion  kai  ton  allon 
semeion,  that  in  every  year  and  in  every  land  bad 
signs  indicate  something  bad,  and  good  signs 
something  favourable,  since  the  symptoms  semeia 
described. ..prove  to  have  the  same  significance  in 
Lybia,  in  Delos,  and  in  Scythia.  So  one  must  clearly 
realize  that  in  the  same  districts  it  is  not  strange  that 
one  should  be  right  in  the  vast  majority  of  instances, 
if  one  learns  them  well  and  knows  how  to  estimate 
and  appreciate  them  properly." 

I  have  previously  recalled  an  enduring  example 
of  his  method,  the  detailed  description  of  the  famous 
fades  hippocratica  (Sebeok  1979:6f);  another 
example  may  here  be  cited  from  Epidemics  I:  "The 
following  were  the  circumstances  attending  the 
diseases,  from  which  I  formed  my  judgments, 
learning  from  the  common  nature  of  all  and  the 
particular  nature  of  the  individual,  from  the  disease, 
the  patient,  the  regimen  prescribed  and  the  prescriber 
—  for  these  make  a  diagnosis  more  favorable  or  less; 
from  the  constitution,  both  as  a  whole  and  with 
respect  to  the  parts,  of  the  weather  and  of  each 
region;  from  the  customs,  mode  of  life,  practices  and 
age  of  each  patient;  from  talk,  manner,  silence, 
thoughts,  sleep  or  absence  of  sleep,  the  nature  and 
time  of  dreams,  pluckings,  scratchings,  tears;  from 
the  exacerbations,  stools,  urine,  sputa,  vomit,  the 
antecedents  of  consequents  of  each  member  in  the 
succession  of  diseases,  and  the  absessions  to  a  fatal 
issue  or  a  crisis,  sweat,  rigor,  chill,  cough,  sneezes, 
hiccoughs,  breathing,  belchings,  flatulence,  silent  or 
noisy,  hemorrhages,  and  hemorrhoids.  From  these 
things  we  must  consider  what  their  consequents  also 
will  be"  (Heidel  1941:129). 

In  The  Science  of  Medicine,  Hippocrates  also 
stated:  "What  escapes  our  vision  we  must  grasp  by 
mental  sight,  and  the  physician,  being  unable  to  see 
the  nature  of  the  disease  nor  to  be  told  of  it,  must 
have  recourse  to  reasoning  from  the  symptoms  with 
which  he  is  presented."  The  means  by  which  a 
diagnosis  may  be  reached  "consist  of  observations 
on  the  quality  of  the  voice,  whether  it  be  clear  or 
hoarse,  on  respiratory  rate,  whether  it  be  quickened 
or  slowed,  and  on  the  constitution  of  the  various 
fluids  which  flow  from  the  orifices  of  the  body, 
taking  into  account  their  smell  and  colour,  as  well  as 
their  thinness  or  viscosity.  By  weighing  up  the 
significance  of  these  various  signs  it  is  possible  to 
deduce  of  what  disease  they  are  the  result,  what  has 
happened  in  the  past  and  to  prognosticate  the  future 
course  of  the  malady"  (Chadwick  and  Mann  1950: 
87-89). 

However,  it  was  Galen,  whose  one  and  only  idol 
was  Hippocrates,  and  whose  medicine  remained  (on 
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the  whole)  Hippocratic,  who  attempted  to  provide 
prognostics,  wherever  feasible,  with  a  scientific 
underpinning,  i.e.  to  base  his  forecasts  on  actual 
observations.  This  he  was  able  to  do  because  he 
practiced  dissection  and  experiment:  whereas 
Hippocrates  studied  disease  as  a  naturalist,  Galen 
"dared  to  modify  nature  as  a  scientist"  (Majno  1975: 
396;  cf.  Neuburger  1906:385).  "Empirical  method 
was  first  formulated  in  ancient  medicine,"  as 
systematic  and  detailed  expression  in  the 
Hippocratic  corpus  (De  Lacy  1941:121),  and  became 
a  part  of  the  theory  of  signs  of  the  Epicureans  and 
Sceptics,  in  opposition  to  the  Stoic  rationalistic 
position.  Philodemus'  fragmentary  treatise  (circa  40 
B.C.)  is  by  far  the  most  complete  discussion  of  a 
thoroughgoing  methodological  work  uncovered  (in 
the  Herculean  library)  and  extensively  elucidated  to 
date.  Galen,  despite  all  of  his  Platonic  training,  was 
later  "forced  by  his  profession  to  be  more  empirical" 
(Phillips  1973:  174),  even  though  this  open-minded 
investigator,  who  continued  to  speak  with  the  voice 
and  authority  of  a  man  of  science,  did  gradually  turn 
into  something  of  a  dogmatic  mystic  (Sarton  1954: 
59).  He  can  therefore  be  regarded  as  a  subtle  founder 
of  clinical  semiotics.  As  such,  his  work  therefore 
constituted  something  of  a  watershed,  since  "die 
galenische  Semiotik  verwertet  die  meisten 
Beobachtungs-  und  Untersuchungsmethoden  die  das 
Altertum  ausgebildet  hat"  (Neuburger  1906:  385). 

Although  Galen  can,  very  likely,  be  reckoned 
the  first  "scientific"  semiotician,  philosophy  and 
medicine  are,  as  well,  consistently  conjoined  in  his 
writings.  Barnes  (1991:50)  recently  reminded  us 
that,  according  to  Galen,  "The  best  doctors. ..are  also 
philosophers,"  and  that  he  even  wrote  a  pamphlet  to 
show  how  a  competent  physician  "possesses  all  the 
parts  of  philosophy  —  logic,  natural  science,  ethics," 
to  which  one  may  dare  add:  semiotics,  or  logic  in  its 
most  technical  Peircean  sense;  (on  Galen's  logic,  see 
Barnes  1991:54-56  et  passim).  Galen  counted 
himself  among  "those  who  teach  the  greatest  and 
finest  human  achievements  —  the  theorems  which 
philosophy  and  medicine  impart."  No  wonder  that 
the  Emperor  Marcus  Aurelius,  as  Galen  himself 
informed  us,  referred  to  him  as  "the  first  among 
physicians,  unique  among  philosophers"  (quoted 
ibid.).  He  was  irjdeed  the  first  to  explicitly  articulate, 
as  far  as  we  know,  the  kind  of  medical  semiotics  put 
in  practice  by  his  most  eminent  predecessors  —  in 
fact,  he  deemed  this  to  have  been  a  platitude  in  the 
Hippocratic  tradition  —  and,  having  reaffirmed  it, 
emulated  by  a  veritable  legion  of  professional 
successors  in  and  beyond  his  craft. 

Particularly  intriguing  is  Galen's  usage  of  the 
term  endeixis  —  roughly:  "indication"  —  in  medicine 
and  logic,  as  most  recently,  and  very  revealingly  (if 
not  exhausively)  reviewed  by  Kudlien  (1991)  (with 
added  commentary  on  its  prehistory  and  Medieval 


Latin  renderings,  by  Durling  [1991]).  This  important 
term,  so  cherished  by  Peirce,  is  very  seldom  found  in 
the  Hippocratic  Corpus  and  then  never  in  a  technical 
sense  or  in  the  sense,  or  rather  the  broad  spectrum  of 
senses,  employed  by  Galen.  Kudlien  rightly 
emphasizes  that  doctors  then,  as  now,  do  not 
deem"indication"  to  be  a  "a  mere  'sign',"  but  rather, 
as  puts  it,  an  action:  that  is,  a  sign  which  points 
(usually)  to  the  treatment  of  certain  symptoms,  or 
the  syndrome,  of  the  disease  being  diagnosed 
(Kudlien  1991 : 103f.).  According  to  him,  the 
evidence  confirms  that  "it  was  Galen,  above  all,  who 
used  the  word  in  the  modern  sense"  {ibid.  104), 
along  with  such  derivatives  as  endeiktikos, 
endeiknynai,  synendeiknysthai,  antendeiknysthai, 
and  the  like.  In  brief,  for  Galen,  endeiktikon  is  by  no 
means  the  same  as  semeion  or  symptomata.  His  own 
usage  was  quite  different  from  those  of  the  adherents 
of  the  Empiricist  or  again  the  Methodist  schools  of 
physicians  (he  counted  himself  a  member  of  the  so- 
called  Dogmatist  school  of  medicine).  "When 
seeking  for  the  true  'endeixis',  the  Dogmatist  would 
not  observe  all  possible  signs/symptoms  as  such  (as 
the  Empiricist  does),  but  select  only  those  that  are 
'indicative  of  the  cause  (sc.  of  the  disease)..." 
(Kudlien  1991:106).  In  brief,  in  Galen's  writings, 
true  endeixis  is  always  something  "logical."  As  he 
clearly  remarked:  "One  must  also  use  the  logical 
[viz.,  semiotic  method  to  recognize  what  all  the 
diseases  are,  with  regard  to  species  and  kinds,  and 
how,  for  every  disease,  one  must  take  an  'endeixis'  of 
the  therapeutical  measures  ('iamata')"  (quoted  in 
Kudlien  1991:107). 

Galen's  pen  was  as  busy  as  his  scalpel.  In  the 
course  of  his  exceptionally  bulky  writings,  he 
classified  semiotics  as  one  of  the  six  principal 
branches  of  medicine  {mere  iatrikes  ta  men  prbta 
esti,  to  te  phusiologikon,  kai  to  aitilogikon  e 
pathologikon  kai  to  hugeinon  kai  to  semiotikon  kai 
to  therapeutikon  [XIV:689]),  an  ordering  that  had  a 
special,  indeed,  critical  importance  for  its  "effect  on 
the  later  history  of  medicine"  (Phillips  1973:172). 
The  strength  of  Galenism,  as  Temkin  (1973:179) 
also  emphasized,  "reposed  in  no  small  measure  in 
having  provided  medical  categories. ..for  relating  the 
individual  to  health  and  disease,"  including 
"semeiology  (the  science  of  signs)." 

Of  semiotics,  Galen  further  specified:  Semeidsis 
de  dai  eis  therapeia  men  anakaia,  all'  ouk  estin  aute 
he  therapeia.  Dia  gar  tes  hules  he  therapeia 
sunteleitai  kai  to  men  hulikon  aneu  therapeias  ouden 
heteron  sumballetai.  To  de  semeioikon  kai  aneu 
therapeias  anankaion  pros  to  eidenai  tina 
therapeutika  kai  tina  atherapeuta  kai  periistasthai 
pros  to  eidenai  tina  therapeutika  kai  tina  atherapeuta 
kai  periistasthai  auta,  hopos  me  epiballomeomenoi 
adunatois  sphalldmetha  (XIV:689).  At  the  end  of 
this  same  chapter,  Galen  then  divided  the  field  into 
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three  enduring  parts:  in  the  present,  he  asserted,  its 
concern  was  inspection,  or  diagnosis,  in  the  past 
cognition,  or  anamnesis  (etiology),  and  in  the  future 
providence,  or  prognosis  (diaireitai  de  kai  to  se- 
meiotikon  eis  tria,  eis  te  epignosin  ton  pareleluthoton 
kai  eis  ten  epikepsin  ton  sunendreuont  on  kai  eis 
prognosin  ton  mellonton")  (XIV:690).  His  clinical 
procedure  was  depicted  by  Sarton  (1954:6)  thus: 
"When  a  sick  man  came  to  consult  him,  Galen... 
would  first  try  to  elicit  his  medial  history  and  his 
manner  of  living;  he  would  ask  questions  concerning 
the  incidence  of  malaria  and  other  common  ailments. 
Then  the  patient  would  be  invited  to  tell  the  story  of 
his  new  troubles,  and  the  doctor  would  ask  all  the 
questions  needed  to  elucidate  them  and  would  make 
the  few  examinations  which  were  possible."  Galen 
regarded  "everything  unnatural  occurring  in  the 
body"  as  a  symptom  (VII:50,  135;  also  X:71ff),  and 
an  aggregation  of  symptoms  (athroisma  ton 
symptomaton)  as  a  syndrome.  He  was  fully  aware 
that  symptoms  and  syndromes  directly  reflected 
clinical  observation,  but  the  formulation  of  a 
diagnosis  required  causal  thinking  (cf.  Siegel  1973). 
He  was  the  master  of  foretelling  the  course  of 
diseases:  Galen  "pflegtc.die  Prognostik  in 
besonderem  Masse  und  nicht  den  geringsten  Teil 
seines  Rufes  als  Praktiker  dankte  er  richtigen 
Vorhersagungen"  (Neuburger  1906:  383).  Although 
his  prognostications  rested  essentially  and  loyally 
upon  the  Corpus  Hippocratum,  his  own  anatomical 
knowledge  and  exactitude  of  mind  predisposed  him 
to  build  up  his  prognoses  from  a  cogent  diagnostic 
foundation. 

It  would  not  appear  unreasonable  to  expect  a 
finely  attuned  reciprocal  conformation  between 
internal  states  and  "reality,"  between  an  Innenwelt 
and  the  surrounding  Umwelt,  or  more  narrowly, 
between  symptoms  and  their  interpretations  as  an 
outcome  over  time,  in  evolutionary  adaptation  — 
prodotto  genetico,  in  Prodi's  succinct  formulation 
(1981:973)  —  that  benefits  an  organism  by  raising  its 
"fittingness." 

But  such  does  not  reflect  the  state  of  the  art  of 
diagnosis.  The  probabilistic  character  of  symptoms 
has  long  been  realized,  among  others,  by  the  Port- 
Royal  logicians  (Sebeok  1976:125);  the  often  vague, 
uncertain  disposition  of  symptoms  was  clearly  af- 
firmed by  Thomas  Sydenham,  the  seventeenth 
century  physician  often  called  the  "English 
Hippocrates"  (Colby  and  McGuire  1981:21).  This 
much-admired  doctor,  held  in  such  high  regard  by 
his  brother  in  the  profession,  John  Locke,  was  also 
known  as  the  "Father  of  English  medicine"  (Latham 
1848:xi).  Sydenham  was  noted  for  his  scrupulous 
recognition  of  the  priority  of  direct  observation.  He 
demanded  "the  sure  and  distinct  perception  of 
peculiar  symptoms,"  shrewdly  emphasizing  that 
these  symptoms  "referred  less  to  the  disease  than  to 


the  doctor."  He  held  that  "Nature,  in  the  production 
of  disease,  is  uniform  and  consistent;  so  much  so, 
that  for  the  same  disease  in  different  persons,  the 
symptoms  are  for  the  most  part  the  same;  and  the 
self-same  phenomena  that  you  would  observe  in  the 
sickness  of  a  Socrates  you  would  observe  in  the 
sickness  of  a  simpleton"  (Latham  1 848: 1 4ff.).  This 
assertion  of  his  was,  of  course,  quite  mistaken, 
although  the  medical-student  jape  referred  to  by 
Colby  and  McGuire  "that  the  trouble  with  psychiatry 
is  that  all  psychiatric  syndromes  consist  of  the  same 
signs  and  symptoms"  (1981:23),  appears  to  be 
equally  exaggerated.  There  are,  to  be  sure,  certain 
diagnostic  difficulties  inherent  in  the  similarities 
between  the  symptomatology  of  functional 
syndromes  and  of  those  of  the  organic  maladies.  The 
marginal,  or  supplementary,  symptoms  of  the  former 
can,  however,  be  assimilated  according  to  specific 
criteria,  such  as  are  set  forth,  for  instance  by  Uexkiill 
(1979). 

This  set  of  strictures  leads  me  to  a  consideration 
of  an  aspect  of  symptom  that  is  seldom  mentioned  in 
the  literature  but  that  I  have  found  both  fascinating 
and,  certainly  for  semiotics,  of  broad  heuristic  value. 
This  has  to  do  with  anomalies,  a  problem  that  was 
considered,  in  a  philosophical  context,  especially  by 
Peirce.  According  to  Humphries  (1968:88),  a 
naturally  anomalous  state  of  affairs  is  such  "with 
respect  to  a  set  of  statements  which  are  at  present 
putatively  true,"  or,  putting  the  matter  in  a  more 
direct  way,  "any  fact  or  state  of  affairs  which 
actually  requires  an  explanation  can  be  shown  to  be 
in  need  of  explanation  on  the  basis  of  existing 
knowledge"  (1968:89).  The  enigmatic  character  of 
semiotic  anomalies  can  especially  well  be  illustrated 
by  clinical  examples,  where  few  existing  models  are 
capable  of  accounting  for  a  multitude  of  facts. 
Medicine  may,  in  truth,  be  one  of  the  few  disciplines 
lacking  an  overarching  theory,  although  local, 
nonlinear,  and  hence  restricted  and  over-simple 
paradigms,  such  as  the  "theory  of  infectious 
diseases,"  certainly  do  exist. 

Take  as  a  first  approach  to  the  matter  of 
anomalies  the  spirochete  Treponema  pallidum.  This 
virus,  in  its  tertiary  phase,  may  manifest  itself 
("cause")  aortitis  in  individual  A,  paretic 
neurosyphilis  in  individual  B,  or  no  disease  at  all  in 
individual  C.  The  latter,  the  patient  with 
asymptomatic  tertiary  syphilis,  can  be  said  to  have  a 
disease  without  being  ill.  Note  that  a  person  may  not 
only  be  diseased  without  being  ill,  but,  conversely, 
be  ill  without  having  a  specifically  identifiable 
disease.  What  can  we  say,  in  cases  such  as  this, 
about  the  implicative  nexus  conjoining  the 
"proposition,"  i.e.  the  virus,  with  its  consequent, 
expressed  in  some  tangible  manner  or,  on  the 
contrary,  mysteriously  mantled?  Are  A,  B,  and  C  in 
complementary  distribution,  and,  if  so,  according  to 
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what  principle  —  the  constitution  of  the  patient,  or 
some  extrinsic  factor  (geographic,  temporal,  societal, 
age-  or  gender-related,  and  so  forth),  or  a  coalition  of 
these?  The  influence  of  context,  one  suspects,  may 
be  paramount.  This  becomes  overriding  in  the  matter 
of  hypertension  —  not  a  disease  at  all,  but  a  sign  of 
cardiovascular  disorder  (Paine  and  Sherman 
1971:272)  —  which  is  realized  in  one  and  only  one 
restricted  frame:  within  that  of  patient/physician 
interaction,  assuming  the  aid  of  certain  accessories, 
such  as  a  sphygmoscope.  Semiosis  is,  as  it  were, 
called  into  existence  solely  under  the  circumstances 
mentioned;  otherwise  there  are  no  symptoms  (the 
asymptomatic,  or  so-called  silent,  hypertension  lasts, 
on  average,  fifteen  years)  —  there  are  no  signs,  and 
there  is,  therefore,  no  determinate  —  i.e.  diagnosable 

-  object. 

Studies  have  shown  that  the  majority  of  people 
who  have  gallstones  --  about  fifteen  million  or 
probably  many  more  Americans  among  them  —  go 
through  life  without  palpable  problems.  The 
presence  of  these  little  pebbles  of  cholesterol  that 
form  in  a  sac  that  stores  digestive  juice  can  clearly 
be  seen  on  X-rays:  the  shadows  are  the  "objective 
signs,"  but  most  of  them  never  cause  pain,  or  any 
other  symptom.  They  remain  mute.  They  are,  in 
other  words,  diagnosed  only  in  the  course  of  detailed 
checkups,  and  thus  require  no  surgical  intervention. 

Sensory  experiences,  at  times,  lead  to  semiosic 
paradoxes,  such  as  the  following  classic 
contravention.  A  hole  in  one  of  my  teeth,  which  feels 
mammoth  when  I  poke  my  tongue  into  it,  is  a 
subjective  symptom  I  may  elect  to  complain  about  to 
my  dentist.  The  dentist  lets  me  inspect  it  in  a  mirror, 
and  I  am  surprised  by  how  trivially  small  the 
aperture  —  the  objective  sign  —  looks.  He  lets  me 
inspect  it  in  a  dental  mirror,  and  I  am  surprised  by 
how  trivially  small  the  aperture  —  this  objective  sign 

—  looks.  The  question  is:  which  interpretation  is 
"true,"  the  one  derived  via  the  tactile  modality  or  the 
one  reported  by  the  optical  percept?  The  felt  image 
and  the  shape  I  see  do  not  match.  The  dentist  is,  of 
course,  unconcerned  with  the  size  of  the  hole,  filling 
the  cavity  he/she  beholds. 

It  is  a  common  enough  experience  that  the 
symptom  (for  reasons  ultimately  having  to  do  with 
the  chance  evolutionary  design  of  the  human  central 
nervous  system)  refers  to  a  different  part  of  the  body 
than  where  the  "damage  is  actually  situated.  "The 
pain  of  coronary  heart-disease,  for  example,  is  felt 
across  the  front  of  the  chest,  in  the  shoulders,  arms 
and  often  in  the  neck  and  jaw.  It  is  not  felt  where  the 
heart  is  -  slightly  over  to  the  left"  (Miller  1978:22). 
Such  a  misreport  is  unbiological,  in  the  sense  that  a 
lay  reading  could  be  fatal.  An  even  more  outlandish 
symptom  is  one  for  which  the  referent  is  housed 
nowhere  at  all,  dramatically  illustrated  by  a  phantom 
limb  after  amputation.  Miller  writes:  "The  phantom 


limb  may  seem  to  move  —  it  may  curl  its  toes,  grip 
things,  or  feel  its  phantom  nails  sticking  into  its 
phantom  palm.  As  time  goes  on,  the  phantom 
dwindles,  but  it  does  so  in  peculiar  ways.  The  arm 
part  may  go,  leaving  a  maddening  piece  of  hand 
waggling  invisibly  from  the  edge  of  the  real 
shoulder;  the  hand  may  enlarge  itself  to  engulf  the 
rest  of  the  limb"  (1978:20).  What  is  involved  here  is 
an  instance  of  subjective  —  as  against  objective  — 
pain,  a  distinction  introduced  by  Friedrich  J.  K. 
Henle,  the  illustrious  nineteenth  century  German 
anatomist  and  physiologist,  and  generally 
perpetuated  in  classifications  of  pain  ever  since  (e.g., 
by  Behan  1926). 

Subjective  pain  is  described  as  having  "no 
physical  cause  for  existence,"  i.e.  there  is  no  organic 
basis  for  its  presence  (indeed,  with  respect  to  a  limb 
unhinged,  not  even  an  organ):  it  results  "of 
impressions  stored  up  in  the  memory  centers,  which 
are  recalled  by  the  proper  associations...  aroused" 
(Behan  1926:74f.),  which  is  to  say  that  the  pain 
remains  connected  with  a  framework  of  signification 
dependent  upon  retrospective  cognizance.  Referred 
pain  and  projection  pain  are  closely  allied;  the  latter 
is  a  term  assigned  to  pain  that  is  felt  as  being  present 
either  in  a  part  that  has  no  sensation  (as  in  locomotor 
ataxia)  or  in  a  part  that  because  of  amputation  no 
longer  exists. 

Certain  symptoms  —  pain,  nausea,  hunger,  thirst, 
and  the  like  --  are  private  experiences,  housed  in  no 
identifiable  site,  but  in  an  isolated  annex  that  humans 
usually  call  "the  self."  Symptoms  such  as  these  tend 
to  be  signified  by  paraphonetic  means,  such  as 
groans  or  verbal  signs,  which  may  or  may  not  be 
coupled  with  gestures,  ranging  in  intensity  from 
frowns  to  writhings.  An  exceedingly  knotty  problem, 
which  can  barely  be  alluded  to  here,  arises  from  the 
several  meanings  of  self  and  how  these  relate  to  the 
matter  of  symptomatology.  The  biological  definition 
hinges  on  the  fact  that  the  immune  system  does  not 
respond  overtly  to  its  own  self-antigens;  there  are 
specific  markers  that  modulate  the  system  generating 
antigen-specific  and  idiotype-specific  cell  lines  —  in 
brief,  activate  the  process  of  self  tolerance.  Beyond 
the  immunological  self,  there  is  also  a  "semiotic 
self,"  which  I  have  discussed  severally  elsewhere 
(cf.  Sebeok  1991,  Chaps.  3  and  4,  and  1992). 

Another  diacritic  category  of  symptoms 
deserves  at  least  passing  mention.  A  linguist  might 
be  tempted  to  dub  these  "minus  features,"  or 
subtractive  symptoms.  Here  belong  all  the  numerous 
varieties  of  asemasia  (Sebeok  1976:57,  1979:58) 
such  as  agnosia,  agraphia,  alexia,  amnesia,  amusia, 
aphasia,  apraxia,  etc.,  as  well  as  "shortcomings"  like 
blurred  vision,  hardness  of  hearing,  numbness  —  in 
short,  symptoms  that  indicate  a  deficit  from  some 
ideal  standard  of  "normality." 
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In  any  discussion  of  symptoms,  it  should  be 
noted  that  even  a  syndrome,  or  constellation  of 
symptoms  —  say,  of  a  gastronomical  character 
(anorexia,  indigestion,  hemorrhoids)  —  may  not  add 
up  to  any  textbook  disease  labeling  or  terminology. 
Ensuing  treatment  may,  accordingly,  be 
denominated  "symptomatic,"  accompanied  by  the 
supplementary  advice  that  the  patient  remain  under 
continuing  observation.  In  some  circumstances,  "the 
syndrome  might  be  ascribed  to  psychologic 
etiology"  (Cheraskin  and  Ringsdorf  1973:37).  What 
this  appears  to  mean  is  that  the  interpretation  of 
symptoms  is  often  a  matter  involving,  over  time,  a 
spectrum  of  sometimes  barely  perceptible 
gradations,  entailing  a  progressively  multiplying 
number  of  still  other  symptoms.  It  is  also  worth 
remarking  that,  temporally,  or  for  predictive  pur- 
poses, symptoms  generally  precede  signs,  which  is 
to  say  that  the  orderly  unfolding  of  evidence  may  be 
termed  prognostic. 

No  one  at  present  knows  how  afferent  neuronal 
activity  acquires  meaning,  beyond  the  strong 
suspicion  that  what  is  commonly  called  the  "external 
world,"  including  the  objects  and  events  postulated 
as  being  contained  in  it,  is  the  brain's  formal 
structure  (logos).  For  all  practical  purposes,  we  are 
ignorant  about  how  the  CNS  preserves  any  structure 
and  assigns  a  meaning  to  it,  how  this  process  relates 
to  perception  in  general,  and  how  it  induces  a 
response.  Implicit  in  this  set  of  queries  is  a  plainly 
linear  model:  for  example,  that  fear  or  joy  "causes" 
increased  heart  rate.  Not  only  does  such  a  model 
seem  to  me  far  too  simplistic,  but  there  is  not  even  a 
shred  of  proof  that  it  exists  at  all. 

The  future  of  symptomatology  will  clearly  rest 
with  program  developments  using  computer 
techniques  derived  from  studies  of  artificial 
intelligence.  These  are  intended  to  mime  and 
complement,  if  not  to  replace,  human  semiosic 
processes,  such  as  judgment  based  on  intuition  (in 
one  word,  abduction).  Such  automated  diagnostic 
counselors  are  already  operational,  as  for  example 
the  program  termed  Caduceus  (McKean  1982).  In 
the  simplified  example  illustrated  in  Fig.  1,  this 
program  "examines  a  patient  with  fever,  blood  in  the 
urine,  bloody  sputum  from  the  lungs,  and  jaundice. 
The  program  adds  together  numbers  that  show  how 
much  each  symptom  is  related  to  four  possible 
diagnoses  --  cirrhosis  of  the  liver,  hepatitis, 
pneumonia,  and  nephritis  —  and  picks  pneumonia  as 
top  contender.  The  runner-up  in  score  is  hepatitis. 
But  because  hepatitis  has  one  symptom  not  shared 
with  pneumonia  (blood  in  the  urine),  Caduceus 
chooses  cirrhosis  as  first  alternative.  This  process, 
called  partitioning,  focuses  the  computer's  attention 
on  groups  of  related  diseases"  (McKean  1982:64). 

[Fig.  1] 


The  craft  of  interpreting  symptoms  has  a 
significance  far  exceeding  the  physician's  day-to-day 
management  of  sickness.  As  Hippocrates  had 
already  anticipated,  its  success  derives  from  its 
psychological  power,  which  critically  depends  on  the 
practitioner's  ability  to  impress  his/her  skills  on  both 
the  patient  and  their  joint  environment  (the  audience 
gathered  in  this  workshop,  which  may  consist  of  the 
patient's  family  and  friends,  as  well  as  the 
physician's  colleagues  and  staff).  Dr.  Joseph  Bell,  of 
the  Royal  Infirmary  of  Edinburgh,  attained  the  knack 
with  panache,  leaving  his  imprint  not  just  on  clinical 
practice  but,  famously,  on  the  detective  story, 
following  in  the  footsteps  of  his  pupil's,  Dr.  Arthur 
Conan  Doyle's  fictional  realization,  Sherlock 
Holmes  (Sebeok  1981a,  Ginzburg  1983).  According 
to  recent  medical  thinking,  the  contemporary 
preoccupation  with  diagnosis  —  that  is,  the  doctor's 
perceived  task,  or  pivotal  drive,  to  explain  the 
meaning  of  the  patient's  condition  —  rests  in  the  final 
analysis  with  the  doctor's  self-assigned  role  as  an 
authenticated  expositor  and  explicator  of  the  values 
of  contemporary  society.  Disease  is  thus  elevated  to 
the  status  of  a  moral  category,  and  the  sorting  of 
symptoms  had  therefore  best  be  viewed  as  a  system 
of  semiotic  taxonomy  —  or,  in  Russian  semiotic 
parlance,  a  "secondary  modeling  system." 

Lord  Horder's  dictum  —  "that  the  most  important 
thing  in  medicine  is  diagnosis,  the  second  most 
important  thing  is  diagnosis  and  the  third  most 
important  thing  is  diagnosis"  (Lawrence  1982)  -- 
must  be  true,  because  medical  knowledge  has  risen 
to  the  status  of  a  means  of  social  control. 
Symptomatology  has  turned  out  to  be  that  branch  of 
semiotics  that  teaches  us  the  ways  in  which  doctors 
function  within  their  cultural  milieu. 
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history,  which  I  need  to  record  along  with  my 
acknowledgments. 
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delivery  at  a  Symposium  on  New  Directions  in 
Linguistics  and  Semiotics,  convened  at  Rice 
University  at  the  initiative  of  Professor  Sydney 
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form,  in  a  book  bearing  the  same  Symposium  title, 
edited  by  James  E.  Copeland  (Houston:  Rice 
University  Studies,  1984),  pp.  21 1-230.  A  somewhat 
revised  version  appeared  thereafter  in  my  own  book, 
/  Think  I  Am  a  Verb:  More  Contributions  to  the 
Doctrine  of  Signs,  pp.  45-58  (New  York:  Plenum 
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On  October  19,  1992, 1  then  presented,  under  the 
new  title  "Medical  Semiotics:  The  Legacy  of  Galen," 
at  the  4th  Congress  of  the  Hellenic  Semiotic  Society, 
at  the  University  of  Thessaloniki,  Greece,  by 
invitation  of  the  co-organizers,  a  substantially 
reworked  but  still  necessarily  abridged  variant  based 
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the  Congress. 

In  November,  1992,  while  I  was  both  a  Visiting 
Professor  of  Semiotics  at  The  University  of  Toronto 
and  a  Senior  Fellow  in  residence  at  Massey  College, 
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I  want  to  thank  Professors  Alexandras  Lagopoulos 
and  Karin  Lagopoulou  for  their  generous  hospitality 
in  Athens,  Iracleo,  and  especially  in  and  around 
Thessaloniki.  I  would  also  like  to  record  my  deep 
appreciation  to  Professor  Marcel  Danesi,  of  The 
University  of  Toronto,  for  providing  me  with  the 
opportunity  to  lecture  for  a  month  in  Ontario  and 
elsewhere  in  Canada,  and  to  write  at  leisure  in  the 
stimulating  environment  of  Massey  College. 

The  present  version  is  a  still  further  expanded 
variant  of  my  Greek  Congress  paper,  which  is  to 
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The  aim  of  this  Report  to  give  the  short  representation  of 
some  ideas,  methods  and  results  of  the  statistical  theory  of 
open  systems. 

I.  PREFACE 

Let  us  begin  by  quoting  from  the  Preface  to  the  au- 
thor's Statistical  Physics  (Moscow,  Nauka,  1982;  Har- 
wood  Academic  Publishers,  New  York,  1986)  [1]: 

"My  God!  Yet  another  book  on  statistical  physics! 
There's  no  room  on  my  bookshelves  left".  Such  emo- 
tion are  quite  understable.  Before  jumping  to  conclusion, 
however,  it  would  be  worthwhile  to  read  the  Introduction 
and  look  through  the  table  of  contents.  Then  the  reader 
will  find  that  this  book  is  totally  different  from  the  ex- 
isting courses,  fundamental  and  concise"   "The  author 

certainly  does  not  wish  to  exaggerate  the  advantages  of 
the  book,  considering  it  as  just  the  first  attempt  to  create 
a  textbook  of  a  new  kind" . 

The  second  step  in  this  direction  was  the  author's 
Turbulent  Motion  and  the  Structure  of  Chaos  (Moscow, 
Nauka,  1990;  Kluwer  Academic  Publishers,  Dordrecht, 
1991)  [2].  This  book  is  subtitled  A  New  Approach  to 
the  Statistical  Theory  of  Open  Systems.  Naturally,  "the 
new  approach"  is  not  meant  the  defy  of  course  the  consis- 
tent and  efficient  methods  of  the  conventional  statistical 
theory  of  the  nonequilibrium  processes.  "The  new  ap- 
proach" is  based  on  a  judicious  combination  of  "the  old" 
and  "the  new" . 

The  following  and  the  radical  step  was  in  the  last  au- 
thor's book  Statistical  Theory  of  Open  Systems.  Vol- 
ume I:  with  the  subtitle  A  Unified  Approach  to  Kinetic 
Description  of  Processes  in  Active  Systems,  published 
by  "Yanus",  Moscow  1995;  Kluwer  Academic  Publish- 
ers, Dordrecht,  1995  [3].  Now  it  is  almost  ready  the 
manuscript  of  the  following  Statistical  Theory  of  Open 
Systems.  Volume  II:  Part  1  New  Approach  in  the  Plasma 
Kinetic  Theory  (Myth  on  Collisionless  Plasma);  Part  2 
Kinetic  Theory  of  Second  Order  Phase  Transitions  [4]. 
The  author  hopes  that  will  be  prepared  and  the  next 
book  Statistical  Theory  of  Open  Systems.  Volume  III: 
Quantum  Open  Systems  [5]. 

We  must  try  to  give  now  the  answer  on  the  question: 


What  means  the  notion  "Physics  of  open  systems?. 

II.  PHYSICS  OF  OPEN  SYSTEMS 

"Physics  of  open  systems"  is  an  interdisciplinary  field 
of  science  [1-5].  Here  is  a  brief  list  of  key  words  and  no- 
tions to  characterize  it:  Chaos  and  Order;  Open  systems; 
Criteria  for  the  relative  degree  of  order  in  open  systems; 
Norm  of  chaos;  Degradation  and  self-organizing;  Syner- 
getics; Diagnostics  of  open  systems;  Semiotic  control; 
Constructive  role  of  dynamic  instability  of  atomic  mo- 
tion; Transition  from  reversible  to  irreversible  equations; 
Kinetic  and  hydrodynamic  description  of  nonequilibrium 
processes  taking  into  account  the  structure  of  continu- 
ous medium;  Nonequilibrium  process  in  active  media; 
Equilibrium  and  nonequilibrium  phase  transitions;  Uni- 
fied kinetic  description  of  laminar  and  turbulent  motions; 
Quantum  open  systems. 

Certainly,  many  of  these  concepts  are  not  at  all  'new'. 
The  purpose  of  "Physics  of  open  systems"  is  to  elaborate 
ideas  and  methods  for  the  integrated  description  of  this 
broad  class  of  problems. 

Open  systems  can  exchange  energy,  matter,  and  (last 
but  not  least)  information  with  the  environment.  We 
shall  consider  only  open  macroscopic  systems.  They  are 
composed  of  many  objects,  constituent  structural  ele- 
ments. These  elements  may  be  microscopic,  e.g.  atoms 
or  molecules  in  physical  and  chemical  systems.  They 
may  be  small  but  macroscopic  such  as  macromolecules 
in  polymers  or  cells  in  biological  structures.  Due  to  the 
complexity  of  open  systems,  they  may  host  a  variety  of 
structures.  Dissipation  plays  a  constructive  role  in  the 
formation  of  these  structures.  At  first  sight,  this  seems 
surprising  because  the  dissipation  concept,  is  associated 
with  the  attenuation  of  various  forms  of  motion,  energy 
scattering,  and  the  loss  of  information.  It  is  extremely 
essential,  however,  that  dissipation  is  necessary  for  the 
formation  of  structures  in  open  systems.  To  emphasize 
this,  I  Prigogine  has  coined  the  term  "dissipative  struc- 
tures" [6,7].  This  comprehensive  and  exact  term  covers 
all  sorts  of  structures:  temporal,  spatial  and  time-  space 
structures.  The  latter  are  exemplified  by  autowaves. 

The  complexity  of  open  systems  provides  an  ample  op- 
portunity for  cooperative  phenomena  to  occur.  In  order 
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to  emphasize  the  role  of  collective  interactions  in  the  for- 
mation of  dissipative  structures,  H.  Haken  has  introduced 
the  term  'synergetics',  that  means  joint  action.  The  ob- 
jective of  synergetics  is  to  reveal  common  ideas,  methods, 
and  laws  in  totally  different  fields  of  natural  science,  so- 
ciology, and  even  linguistics  [8,9]. 

Moreover,  synergetics  is  an  area  where  various  special 
disciplines  cooperate.  The  scope  of  synergetics  is  illus- 
trated by  a  series  of  books  under  the  common  title  of 
Synergetics  published  by  Springer  Verlag.  The  last  issue 
in  this  series  is  volume  67  [10]. 

Synergetics  stems  from  thermodynamics  and  statistical 
physics.  This  accounts  for  the  word  Physics  being  the 
first  in  the  title  of  this  Section.  Thereby,  it  is  emphasized 
that  the  theory  of  open  systems  is  virtually  based  on 
fundamental  physical  laws 

All  systems  that  we  are  going  to  deal  are  macroscopic. 
This  means  that  they  consist  of  a  large  number  of  ele- 
ments. This  allows  in  many  cases  to  treat  such  systems 
as  a  continuous  medium.  Such  convention  change  dra- 
matically the  nature  of  the  system.  In  order  to  avoid  the 
potential  problems, it  is  necessary  to  use  a  physical  defi- 
nition of  continuous  medium  rather  than  a  formal  math- 
ematical definition.  This  requires  a  concrete  definition 
of  physically  infinitesimal  time  and  length  scales  in  term 
of  characteristic  parameters  of  system.  The  correspond- 
ing physically  infinitesimal  volume  is  the  equivalent  as  a 
physical  "point". 

III.  PHYSICALLY  INFINITESIMAL  INTERVALS 

(SCALES)  OF  LENGTH  AND  TIME  IN  A 
RAREFIED  GAS  AND  A  RAREFIED  PLASMA 

[1-3,11] 

A.  Rarefied  gas.  Kinetic  level  of  description. 

Thus,  a  rarefied  (Boltzmann)  gas  and  a  rarefied  (De- 
bye)  plasma  are  characterized  by  the  dimensionless  den- 
sity and  plasma  parameters: 

*  1 
£  =  nrg«l,  ^<1  (3.1) 

nrD 

These  parameters  determine  and  the  connection  be- 
tween infinitesimal  scales  Tph  and  lph  with  the  corre- 
sponding relaxation  ("collision")  parameters  r  =  rrei  and 
/  =  lrei  of  the  Boltzmann  gas  and  the  Debye  plasma. 

We  denote  by  Nph  =  nVph  the  number  of  particles 
in  the  physically  infinitesimal  volume  Vph-  By  definition 
of  physically  infinitesimal  scales,  the  number  of  particles 
within  Vph  is  large  (that  is  Nph  ^>  1),  and  the  scales  rph 
and  /  ph  are  small  compared  to  the  characteristic  scales 
T  and  L  (r  and  /  for  the  Boltzmann  gas  and  the  Debye 
plasma) : 

Tph  «  f,    lph  «  L,    Nph  »  1.  (3.2) 


The  definition  of  physically  infinitesimal  scales  is  not  be 
universal:  it  will  depend  on  the  adopted  level  of  descrip- 
tion on  nonequilibrium  processes  (kinetic,  hydrodynamic, 
diffusion). 

Boltzmann  gas.  The  following  relationships  between 
characteristic  lengths  hold  for  a  rarefied  gas: 

r0«ra„<<C/,        e  =  nrg«l.  (3.3) 

The  time  of  transition  to  local  equilibrium  is  deter- 
mined by  the  "collision  time"r.  The  relevant  length  scale 
is  the  mean  free  path  /.  So,  for  the  kinetic  stage  of  relax- 
ation we  set  in  formulas  (3.2): 

T->r,     I-W.  (3.4) 

At  first  sight,  this  attempt  to  regard  a  rarefied  gas  as  a 
continuous  medium  may  seen  paradoxical.  To  show  that 
this  is  not  so,  we  proceed  as  follows. 

Divide  the  value  of  r,  which  is  average  time  interval 
between  two  consecutive  collisions  of  a  given  particle, 
into  the  number  Nph  =  nVph-  The  resulting  value  is  the 
time  between  consecutive  collisions  of  an  single  particle 
contained  within  physically  infinitesimal  volume  Vph-  It 
would  be  natural  to  take  this  for  the  definition  of  Tph- 
Using  the  relationship  rvh  —  Iph/^T  for  the  kinetic  stage 
of  description,  we  obtain  two  equations  , 

~TT       ~  -73-  =  Tph;  Tph  =   .  (3.5 

Nph      nlfh  vT 

Hence,  making  use  of  the  definition  of  /  and  e  ,we  ob- 
tain concrete  estimates  for  physically  infinitesimal  scales: 

Tph  ~  \Rt  <  t,     lph  ~  y/el  <  /,  and  Nph  =  -4=  »  1. 

(3.6) 

In  hydrodynamics  the  relaxation  times  are  expressed 
in  terms  of  the  external  parameter  L,  and  one  of  the 
three  dissipative  coefficients:  diffusion  D,  viscous  fric- 
tion v,  and  thermal  conductivity  x-  All  these  dissipative 
processes  are  of  diffusion  type;  accordingly,  the  "coeffi- 
cient of  diffusion"  is  represented  by  one  of  the  coefficient 
D,  1/,  and  \-  Then  the  time  of  relaxation  by  diffusion  is 

L2 

td  =  — ,  whereD  =  D  ,  v  or^.  (3.7) 

For  diffusion  processes  the  linkage  between  physically  in- 
finitesimal scales  is  given,  instead  the  "kinetic  relation- 
ship" Tph  =  lph/vT,  by  the  corresponding  "gasdynamic 
("G")  relationship 

(ig\2 

Tph  =  ^-^     where  D  =  D,or  ,v,  or  \.  (3.8) 

Observe,  that  by  this  definition,  "traces"  of  diffusive 
(hydrodynamic)  motion  are  preserved  also  within  the 
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physically  infinitesimal  volume  Vph  -  in  a  "point"  of  con- 
tinuous medium. 

Hence,  making  use  of  the  definitions  (3.7),  (3.8),  we 
obtain  the  concrete  estimates  of  physically  infinitesimal 
scales  for  the  gasdynamic  level  of  description: 


G 

Tph  ~ 


TP 
N2/< 


<  TD,  I 


ph 


#1/5 


C  L,  N, 


Ph 


N2'5  >  1. 
(3.9) 


Here  we  have  used  equation  nL3  =  N. 

We  see  that,  unlike  the  case  of  (3.6)  for  the  kinetic  de- 
scription, now  the  physically  infinitesimal  scales  depend 
on  the  external  parameter  L.  This  is  a  clear  indication 
that  the  definition  of  physically  infinitesimal  scales  for 
the  Boltzmann  gas  depends  on  the  adopted  level  of  de- 
scription. 

Naturally,  the  gasdynamic  descriptions  is  rougher  than 
the  kinetic  one,  and  so  the  "point"  of  continuous  medium 
in  hydrodynamics  is  larger,  that  is,  Vph  <  VpGh  .  The  tran- 
sition from  the  kinetic  equation  to  the  more  "rough"  gas- 
dynamic description  can  be  made  in  the  following  man- 
For  this  transition  it  is  usually  uses  the  pertur- 


ner. 


bation  theory  on  Knudsen  number  (methods  of  Hilbert, 
Chapmasn-Enscog  and  Grad). 


Kn  =  j 


(3.10) 


However,  in  this  way  we  meet  serious  difficulties  in  ap- 
plying the  pointed  perturbation  theory.  They  are  due  to 
the  fact  that  parameter  Kn  does  not  reflect  the  structure 
of  the  "continuous  medium". 

Instead  of  Knudsen  number  it  is  more  natural  to  use 
the  "physical  Knudsen  number".  For  the  kinetic  level  of 
description  it  is  defined  by  the  expression: 


KPh  = 


ph 


Iph 
I 


1 


(3.11) 


The  smallness  of  this  parameter  is  ensured  by  condition 
Nph  »  1. 

In  the  limit  of  free  molecular  flow  -"collisionless  flow", 
when  the  characteristic  length  L  (for  instance,  the  di- 
ameter of  pipe)  is  much  smaller  than  the  free  path  /, 
the  approximation  of  continuous  medium  may  be  used 
as  long  as 


lPh  <  L  <  /. 


(3.12) 


In  the  case  of  hydrodynamic  description,  the  physically 
infinitesimal  scales  are  defined  by  expressions  (3.9),  and 
the  physical  Knudsen  number  is 


lxph  — 


lph 
L 


1 


1/2- 


(3.13) 


The  smallness  of  this  parameter  is  ensured  by  condition 
N?h  »  1- 


For  the  unified  kinetic  and  gas-dynamic  description  the 
size  of  a  "point"  is  defined  by  relations: 


I 


phlph 


ph 


ph  ■ 


(3.14) 


We  see  that  the  minimum  length  Lmin  -  the  minimum 
size  of  point,  for  which  the  traces  of  diffusive  motion  are 
still  retained  and  the  gasdynamic  description  is  still  fea- 
sible, is  less  than  mean  free  path  /  and  greater  than  the 
kinetic  physically  infinitesimal  scale  lph-  So,  a  unified  de- 
scription of  kinetic  and  hydrodynamic  processes  is  possi- 
ble in  a  broad  range  of  Knudsen  numbers  without  using 
perturbation  theory. 

The  relevant  characteristic  time  is  defined  as 


(rpGh) 


D 


y/er 


Tph 


(3.15) 


and  is  of  the  order  of  physically  infinitesimal  time  interval 
for  kinetic  description.  We  shall  need  this  result  for  the 
generalization  kinetic  equation  for  unified  description  of 
kinetic  and  gasdynamic  processes  in  rarefied  gases  and 
plasmas. 

The  above  obtained  relationships  allow  making  esti- 
mates for  the  number  of  particles  within  a  "point": 


nLi 


-5/4 


(3.16) 


Hence,  at  normal  conditions,  when  e  ~  10-4  ,  the  number 
of  particles  within  a  "point"  is  7Vmin  «  105. 

Rarefied  Coulomb  plasma  (Debye  plasma)  For  a 
rarefied  plasma  the  possibilities  of  the  unified  description 
kinetic  and  hydrodynamic  processes  are  more  justified, 
than  for  a  rarefied  gas.  It  is  the  cause  due  to  collective 
character  of  charged  particles  interaction. 

For  a  rarefied  plasma  there  are  two  possibilities  for  the 
definition  of  the  physically  infinitesimal  scales.  The  for- 
mer is  similar  to  one  that  we  meet  in  the  kinetic  definition 
(3.6)  for  a  rarefied  gas. 

However,  for  a  plasma  it  is  a  more  natural  to  adopt 
the  following  definition  of  physically  infinitesimal  scales: 


ph 


D 


D  ~  vjl. 


(3.17) 


Here,  as  for  a  rarefied  gas,  three  kinetic  coefficient:  dif- 
fusion D,  viscous  friction  v,  and  thermal  conductivity  \  , 
are  equal.  We  see  that  now  the  connection  between  phys- 
ically infinitesimal  scales  is  defined  by  the  "gasdynamical 
relation" 


Tph 


I2 
lph_ 

D 


(3.18) 


Thus,  for  a  rarefied  plasma  there  is  a  wide  region  of 
parameters  (more  wide  than  for  a  rarefied  gases)  for  the 
unified  description  of  kinetic  and  gasdynamical  processes 
Now  the  role  of  the  length  Lmm  plays  the  Debye  radius. 
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The  physical  Knudsen  number  and  the  maximal  usual 
Knudsen  number  for  a  plasma  the  following  formulas  are 
defined  as: 

Kph  =  l-f~r-^,    (tf„)max  =  — =  -»l-  (3.19) 

The  last  inequality  opens  a  possibility  for  the  unified 
description  of  kinetic  and  hydrodynamic  processes  in  a 
rarefied  plasma  on  scales  L  >  ro. 

We  have  now  enough  information  about  the  structure 
of  a  rarefied  plasma  as  a  continuous  medium  in  order  to 
construct  the  corresponding  kinetic  equations. 

IV.  RELATIVE  ORDERING  CRITERIA  IN  OPEN 
SYSTEMS  [1-5,12] 

A.  Degradation  and  self-organization  in  evolution 

Evolution  is  the  process  of  changes  and  development 
in  nature  and  society.  Worded  in  this  manner,  it  is  a 
very  general  concept.  In  physically  closed  systems,  evo- 
lution in  time  results  in  the  equilibrium  state  to  which 
maximum  entropy  and  the  maximum  degree  of  chaos  cor- 
respond. 

In  open  systems,  it  is  possible  to  distinguish  two  classes 
of  evolutionary  processes: 

(1)  Development  in  time  towards  the  nonequilibrium 
stationary  state 

(2)  Evolution  via  a  series  of  nonequilibrium  stationary 
states  of  an  open  system.  The  latter  process  is  due  to 
variation  of  the  so-called  control  (governing)  parameters. 

Darwin's  theory  of  evolution  is  based  on  the  princi- 
ple of  natural  selection.  Thus,  evolution  can  either  lead 
to  degradation  or  represent  a  self-organization  process 
during  which  more  complex  and  sophisticated  structures 
arise.  Can  selforganization  be  the  unique  outcome  of  evo- 
lution? The  answer  is  negative  because  neither  physical 
nor  even  biological  systems  display  an  "intrinsic  drive" 
towards  selforganization.  Therefore,  evolution  may  also 
lead  to  degradation.  A  physical  example  is  evolution  to 
the  equilibrium  state,  the  most  chaotic  one,  according  to 
Boltzmann. 

Thus,  selforganization  is  only  one  of  the  possible  routes 
of  evolution.  Criteria  for  selforganization  are  needed  to 
answer  the  question  along  which  route  a  process  will  de- 
velop, but  such  fundamental  concepts  as  degradation  and 
selforganization  are  not  necessarily  to  be  defined.  This 
is  very  difficult  to  do,  laying  aside  the  ambiguity  of  such 
definitions.  Of  much  greater  importance  is  the  compara- 
tive analysis  of  the  relative  degree  of  order  (or  chaos)  in 
different  states  of  the  open  system  being  examined.  Only 
such  analysis  can  answer  the  question  whether  the  open 
system  undergoes  selforganization  or  degradation. 

We  have  already  emphasized  the  concepts  of  chaos  and 
order.  Now,  what  distinguishes  order  from  chaos? 


There  are  cases  when  the  difference  between  them  is 
clear.  However,  it  appears  from  the  comparison  of  lam- 
inar and  turbulent  flows  that  a  seemingly  obvious  infer- 
ence may  turn  out  to  be  incorrect.  Quantitative  criteria 
for  the  relative  degree  of  order  (or  chaos)  in  different 
states  of  open  systems  allow  a  more  reliable  conclusion 
to  be  obtained. 

The  results  of  such  analysis  are  objective  and  provide 
additional  information  which  constitutes  the  basis  for  the 
establishment  of  the  "  norm  of  chaos"  and  helps  to  reveal 
two-  side  deviations  from  the  norm  under  the  influence 
of  various  impacts.  In  biology,  for  instance,  all  kinds  of 
stress  may  cause  deviations  in  the  degree  of  chaos  from 
the  norm.  Deviations  on  either  side  suggest  "pathology", 
i.e.  represent  the  process  of  degradation. 

Therefore,  a  statement  (based  on  a  selected  criterion) 
of  the  impaired  degree  of  chaos  does  not  necessarily  mean 
that  selforganization  occurs  and  vice  versa,  an  increase  in 
the  degree  of  chaos  is  not  always  identifiable  with  degra- 
dation. Such  a  conclusion  is  valid  only  for  those  physical 
systems  where  thermal  equilibrium  may  be  taken  as  the 
reference  point  for  the  degree  of  chaos.  For  example,  in 
such,  an  open  system  as  a  generator  of  electrical  oscilla- 
tions, the  equilibrium  state  is  that  of  zero  feedback  when 
only  thermal  fluctuations  exist  in  the  electrical  contour. 

Since  an  organism  normally  functions  only  if  a  certain 
norm  of  chaos  is  available,  corresponding  to  an  essentially 
nonequilibrium  state,  the  above  reference  point  is  non- 
existent. This  accounts  for  the  lack  of  objective  informa- 
tion about  variation  of  the  degree  of  chaos  in  biology  as 
well  as  in  economics  and  sociology  which  hampers  dis- 
tinguishing between  self-organization  and  degradation  in 
such  systems. 

However,  another  classification  is  equally  relevant.  The 
norm  of  chaos  being  determined,  deviations  on  either 
side  may  be  regarded  as  "pathology",  i.e.  degradation. 
Therefore,  it  is  possible  to  monitor  the  choice  of  "ther- 
apy". Here,  a  criterion  for  the  relative  degree  of  or- 
der is  at  stake  again.  If  the  'treatment'  normalizes  the 
state  of  the  open  system,  in  terms  of  this  criterion,  self- 
organization  occurs.  Otherwise,  "therapy"  leads  to  fur- 
ther degradation. 

The  difficulty  in  introducing  the  relative  measure  of 
order  (or  chaos,  for  that  matter)  in  open  systems  is 
in  the  first  place  due  to  the  absence  of  definitions  for 
the  initial  concepts:  chaos,  order,  degradation,  and  self- 
organization.  It  has  already  been  mentioned  that  the  def- 
initions of  these  concepts  are  to  a  great  extent  arbitrary. 
We  have  just  noted  that  the  transition  to  a  more  chaotic 
state  should  not  necessarily  be  regarded  as  degradation. 
It  is  essential  to  consider  deviations  from  the  norm  of 
chaos. 

In  this  context,  it  appears  useful  to  consider  the  prin- 
cipal concepts  at  greater  length  in  order  to  formulate  the 
criterion  for  the  relative  degree  of  order  without  which 
the  notions  of  degradation  and  self-organization  actually 
remain  void  of  meaning. 
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B.  Physical  and  dynamical  chaos 

Although  "chaos"  and  "chaotic  motion"  are  fundamen- 
tal physical  concepts,  their  precise  definitions  are  lacking, 
according  to  Boltzmann,  motion  in  an  equilibrium  state 
is  most  chaotic.  However,  motions  far  from  equilibrium 
are  also  called  chaotic.  Such  is  the  "motion"  in  noise 
generators  intended  for  signals  suppression. 

Normally,  different  forms  of  turbulent  motion  in  gases 
and  fluids  are  also  described  as  chaotic,  e.g.  turbulent 
motion  in  pipes  which  arises  from  laminar  motion  when 
the  pressure  difference  at  the  ends  of  the  pipe  is  suffi- 
ciently large.  It  seems  natural  to  conceive  turbulent  mo- 
tion as  being  more  chaotic  than  laminar  motion.  How- 
ever, such  a  view  largely  stems  from  the  confusion  of  the 
concepts  of  complexity  and  chaos.  Observation  of  tur- 
bulent motion  primarily  reveals  its  complexity,  whereas 
additional  analysis  is  needed  to  estimate  the  degree  of 
chaos  and  appropriate  criteria  to  quantify  it. 

Of  late,  the  concept  of  "dynamical  chaos"  has  been 
extensively  exploited  to  characterize  complex  motions  in 
relatively  simple  dynamic  systems.  The  word  "dynamic" 
implies  the  absence  of  fluctuation  sources,  that  is  sources 
of  disorder. 

For  this  reason,  the  "dynamic  system"  concept  is  a 
some-  what  idealized  one.  A  more  real  chaotic  motion, 
with  random  sources  taken  into  account,  might  be  called 
"physical  chaos".  An  example  is  the  chaotic  motion  of 
atoms  and  molecules  in  equilibrium.  The  mathematical 
notion  of  "dynamical  chaos"  can  be  traced  back  to  the 
works  of  H.  Poincare  and  A  N  Kolmogorov. 

The  main  feature  of  dynamical  chaos  is  dynamic  insta- 
bility of  motion.  It  is  expressed  as  strong  (exponential) 
divergence  of  the  originally  close  trajectories  and  leads 
to  their  mixing. 

A  major  contribution  to  the  investigation  into  the  re- 
lationship between  the  dynamical  and  statistical  descrip- 
tions of  complicated  motion  was  made  by  the  prema- 
turely deceased  N.S.Krylov  [13].  He  was  the  first  to  raise 
the  question  of  the  role  of  dynamic  instability  and  mix- 
ing as  the  basis  of  statistical  physics  in  his  posthumously 
published  book  Works  on  the  Foundations  of  Statistical 
Physics  (1950). 

C.  Constructive  role  of  dynamic  instability  of  motion 

Relatively  simple  dynamic  systems  can  give  rise  to  very 
complicated  motions  perceived  as  chaotic.  This  was  the 
reason  for  the  introduction  of  such  new  concepts,  as  the 
strange  attractor  and  dynamical  (or  determined)  chaos. 

As  a  rule,  the  word  "chaos"  has  negative  connotations 
in  physics,  biology  and  economics.  However,  the  concept 
of  chaos  is  a  many-faceted  one.  For  example,  life  can  exist 
neither  in  complete  chaos  nor  in  perfect  order.  A  normal 
organism  needs  a  certain  norm  of  the  degree  of  chaos 


which  can  be  estimated  on  the  criterion  of  the  relative 
degree  of  chaos. 

Given  an  opportunity  to  measure  the  relative  degree 
of  chaos,  the  word  requires  no  additional  attributes.  It  is 
therefore  appropriate  to  ask  whether  the  term  "dynam- 
ical chaos"  is  relevant.  In  fact,  it  was  coined  to  charac- 
terize the  complex  states  which  arise  from  dynamic  in- 
stability, i.e.  exponential  divergence  of  trajectories  at 
a  minor  change  of  the  initial  conditions.  However,  this 
term  is  somewhat  in  conflict  with  the  fact  that  the  tra- 
jectories computed  from  dynamic  equations  can  be  re- 
produced based  on  the  initial  data  in  a  numerical  exper- 
iment. Moreover,  dynamic  instability  can  play  a  con- 
structive role  in  the  physics  of  open  systems.. 

Indeed,  dynamic  instability  of  motion,  i.e.  exponential 
divergence  of  the  trajectories  leads  to  mixing  of  trajecto- 
ries in  phase  space.  This  accounts  for  the  possibility  of 
smoothening  at  physically  infinitesimal  scales  and  intro- 
ducing the  concept  of  "continuous  medium"  to  pass  from 
reversible  microscopic  equations  of  motion  for  gas  parti- 
cles to  irreversible  kinetic  and  hydrodynamic  equations 
for  macroscopic  functions  of  the  continuous  medium. 

In  this  approach,  the  atomic  structure  of  a  system 
is  taken  in  account  to  define  "a  point  of  continuous 
medium".  This  requires  that  physically  infinitesimal 
time  and  length  scales  be  defined  as  well  as  the  corre- 
sponding physically  infinitesimal  volume  which  actually 
stands  for  the  "point"  of  a  continuous  medium. 

Naturally,  it  is  desirable  that  such  definitions  agree 
with  the  definition  of  the  minimum  mixing  region  and 
the  minimum  time  for  the  development  of  dynamic  in- 
stability. 

D.  Criterion  for  the  relative  degree  of  order. 
S-theorem 

Of  all  macroscopic  functions,  only  entropy  S  possesses 
a  combination  of  properties  that  allow  it  to  be  used  as  a 
measure  of  uncertainty  (chaos)  in  the  statistical  descrip- 
tion of  processes  in  macroscopic  systems. 

Boltzmann  gave  a  statistical  definition  of  entropy  for 
both  equilibrium  and  nonequilibrium  (irreversible)  pro- 
cesses and  proved  the  famous  H -theorem. 

It  states:  The  entropy  of  a  system  tending  to  an  equi- 
librium state  grows  and  remains  unaltered  after  equilib- 
rium is  attained.  According  to  Boltzmann,  the  degree  of 
chaos  monotonically  increases  during  evolution  and  has 
the  maximum  value  at  equilibrium,  entropy  being  a  mea- 
sure of  uncertainty  (chaos). 

In  this  context,  it  is  essential  that  during  the  evolu- 
tion of  a  closed  system  to  equilibrium  in  compliance  with 
the  Boltzmann  equation,  the  average  energy  (E)  remains 
constant.  This  means  that  fluctuations  of  energy  are  al- 
lowed. This  to  internal  non-clossedness  of  the  Bolt  zmann 
gas  system.  It  is  due  to  the  existence  of  an  infinite  (in  the 
thermodynamic  limit)  buffer  of  internal  degree  of  free- 
dom. Boltzmann's  kinetic  equation  only  contains  a  small 
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fraction  of  total  information  about  the  system.  However, 
conservation  of  average  energy  in  course  of  evolution  is 
not  a  common  property  of  all  kinetic  equations. 

It  is  only  natural  that  the  criterion  for  the  relative  de- 
gree of  order  needs  to  be  universal.  There  is  no  reason  it 
should  be  applicable  only  to  a  class  of  systems  where  av- 
erage energy  is  conserved  during  evolution.  Which  route 
may  lead  to  the  solution  of  this  problem? 

Entropy  being  the  sole  function  with  the  properties  of  a 
measure  of  chaos,  there  is  but  one  option.  It  is  necessary 
to  redefine  entropy  so  that  the  average  energy  remains 
constant  in  the  course  of  evolution. 

We  shall  consider  the  evolution  of  stationary  states 
in  open  systems  at  slowly  changing  governing  parame- 
ters. It  is  for  this  type  of  evolution,  that  the  criterion 
for  the  relative  degree  of  order  in  various  states  of  open 
systems  will  be  introduced  below.  This  criterion  was  for 
the  first  time  formulated  by  the  author  and  called  the 
"S-theorem". 

Let  us  consider  the  evolution  of  stationary  states  of  an 
open  system  at  a  varying  control  parameter  a.  Let  us  fur- 
ther assume  the  existence  of  two  states  with  the  control 
parameters  a  —  0  and  a  —  a\,  e.g.  the  stationary  states  of 
the  Van  der  Pol  generator  at  different  values  of  the  feed- 
back parameter.  Of  course,  the  description  of  generation 
should  take  into  account  both  current  and  charge  fluc- 
tuations. Then,  thermal  fluctuations  of  the  current  and 
the  charge  in  the  electrical  contour  correspond  to  the 
former  parameter  value,  when  feedback  is  absent.  The 
developed  generation  state,  with  the  feedback  parameter 
considerably  exceeding  the  threshold,  corresponds  to  the 
latter  value. 

In  the  general  case,  the  degree  of  order  of  the  distin- 
guished states  differs,  which  accounts  for  one  of  them 
being  more  chaotic  than  the  other.  Let  us  term  it  "phys- 
ical chaos".  As  a  rule,  this  state  is  nonequilibrium  and 
more  ordered  than  the  equilibrium  state.  However,  in 
the  case  of  a  generator  with  a  —  ao,  it  coincides  with  the 
equilibrium  state. 

Let  us  denote  the  macroscopic  characteristic  of  the  sta- 
tionary state  as  X.  The  role  of  X  for  a  generator  can  be 
played  by  the  oscillation  energy  E.  Let  us  further  denote 
the  distribution  functions  of  two  distinguished  states  as 
/o,/i  and  the  corresponding  entropy  values  as  So,  S\. 

In  the  general  case,  there  is  no  such  notion  as  energy 
for  an  open  system  and  only  the  effective  energy  can  be 
introduced.  It  may  just  as  well  be  termed  the  effective 
Hamilton  function  and  denoted  as  Heff.  It  is  defined 
by  the  distribution  function  of  the  physical  chaos  state 
Hej  f  =  —  In  /o  and  as  a  rule  vanishes  with  a  change  of  the 
control  parameter.  For  this  reason,  functions  substituted 
by  the  corresponding  new  values  /o,So  ,  if  the  entropy 
difference  So  —  Si  needs  to  be  used  as  a  measure  of  the 
relative  degree  of  order  in  the  distinguished  states.  These 
new  values  are  determined  from  the  equality  condition 
for  the  examined  states  of  the  average  effective  Hamilton 
function. 

When  the  physical  chaos  state  coincides  with  the  equi- 


librium state,  as  is  the  case  with  the  Van  der  Pol  gener- 
ator, renormalization  is  carried  out  by  substituting  tem- 
perature T  by  the  new  value  T.  It  is  determined  ^from 
the  solution  of  the  equation  which  describes  the  equality 
condition  for  the  average  effective  Hamilton  functions  in 
the  two  states  of  interest.  This  equation  has  the  form: 

J  Hefffo(X,a  =  a0)dX  =  j  Heff  fi(X,  a  =  ax)dX. 

(4.1) 

Given  the  correct  choice  of  the  'physical  chaos  state', 
the  solution  of  this  equation  has  the  form: 

f(a)>T.  (4,2) 

The  sign  of  equality  is  relevant  at  a  =  ao,  i.e.  for  the 
state  of  physical  chaos.  Evidently,  the  state  "0"  should 
be  'heated'  to  equalize  average  energies.  As  the  com- 
parison is  now  made  at  identical  values  of  the  average 
effective  energy,  the  entropy  difference  So,  Si  can  serve 
as  a  measure  of  the  relative  degree  of  order  in  the  distin- 
guished states. 

The  renormalized  distribution  function  may  be  pre- 
sented in  the  form  of  the  canonical  Gibbs  distribution: 

i(j)a,SafibMl.  (4.3) 

The  corresponding  expression  for  the  Boltzmann- 
Gibbs-  Shannon  entropy  is  derived  from  the  equation 

So  =  -J  In  (/oPO)  f0(X)dX.  (4.4) 

Now,  let  us  turn  back  to  the  equation  (4.1).  If  the 
"0"  state  is  coincident  with  equilibrium,  its  solution  has 
the  form  (4.2),  with  T  standing  for  the  temperature.  In 
the  general  case,  the  "0"  state,  i.e.  the  state  of  physical 
chaos,  is  a  nonequilibrium  one.  The  distribution  (4.3) 
includes  effective  temperature  which  is  equal  to  unity  for 
the  physical  chaos  state.  Therefore,  the  solution  (4.2) 
should  be  written  in  the  form 

f{a)  >  1.  (4.5) 

Here,  the  sign  of  equality  also  corresponds  to  the  state  of 
physical  chaos.  However,  both  effective  temperature  and 
free  energy  are  now  dimensionless.  If  the  inequality  (4.5) 
is  valid  at  control  parameter  values  a  >  ao,  the  choice  of 
the  condition  "0"  in  the  form  of  physical  chaos  is  correct  , 
and  the  relative  degree  of  order  in  the  distinguished  states 
is  defined  by  the  difference  between  the  corresponding 
entropies. 

Using  the  expression  (4.3)  for  the  distribution  function 
fo(X)  and  the  constancy  condition  for  the  average  effec- 
tive Hamilton  function,  the  expression  for  the  entropy 
difference  may  be  presented  as  the  inequality: 
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S0-S  =  J  (lnijl^/xpOo^O.  (4.6) 

on  the  condition  that 

(Hejf)  =  const.  (4.7) 

The  known  inequality  lna  >  1  —  1/a  at  a  —  /i//o  is  used 
to  derive  the  formula  (4.6). 

To  summarize,  the  result  of  computing  the  relative 
degree  of  order  in  the  two  distinguished  nonequilibrium 
states  is  represented  by  two  inequalities.  One  (4.5)  con- 
firms the  correct  choice  of  the  "0"  state  as  presenting 
physical  chaos.  Given  the  opposite  inequality,  physical 
chaos  would  be  represented  by  the  state  "1".  The  for- 
mula (4.6)  provides  a  quantitative  measure  of  the  relative 
degree  of  order  in  the  distinguished  states. 

The  above  calculations  were  made  for  the  case  of  one 
a  parameter.  When  several  control  parameters  are  avail- 
able, it  is  possible  to  optimize  the  search  for  the  most 
ordered  state. 

Let  us  now  apply  the  "S-theorem"  to  estimate  the  rel- 
ative degree  of  order  upon  the  transition  from  laminar  to 
turbulent  flow. 


E.  Relative  ordering  of  laminar  and  turbulent  flows 

[2,3,14] 

Calculations  based  on  the  S-theorem  allowed  the  gen- 
eral results  (4.5),  (4.6)  to  be  specified  for  the  case  of 
transition  from  the  laminar  flow  in  a  pipe  to  the  steady 
state  turbulent  flow. 

It  will  be  clear  from  the  forthcoming  discussion  that 
laminar  flow  may  be  assumed  to  represent  the  state  of 
physical  chaos.  The  role  of  the  effective  Hamilton  func- 
tion is  played  by  the  average  kinetic  energy  of  the  laminar 
flow.  For  the  equality  of  this  energy  in  both  laminar  and 
turbulent  flows  to  be  true,  the  laminar  flow  should  be 
"warmed  up" : 

kBTlam  =  kBTturb+  j({Su)2)  >  kBTlam.  (4.8) 

The  temperature  difference  is  defined  by  the  sum  of 
squared  diagonal  elements  in  the  Reynolds  stress  tensor. 
Reynolds'  stress  representing  collective  degrees  of  free- 
dom, the  equality  (4.8)  may  be  interpreted  as  indicating 
that  a  part  of  thermal  (chaotic)  motion  is  replaced  by 
the  collective  degrees  of  freedom  during  the  transition 
from  laminar  to  turbulent  flow.  This  justifies  the  choice 
of  the  Reynolds  stress  tensor  as  the  order  parameter  of 
the  turbulent  flow. 

The  result  (4.6)  which  in  the  present  case  defines  the 
relative  degree  of  order  in  laminar  and  turbulent  flows 
has  the  form: 

Til  11 

T(Slam  -  Sturb)  =  —  ((Su)2)  >  0.  (4.9) 


Thus,  the  entropy  of  the  turbulent  flow  is  lower  than 
that  of  the  laminar  one.  This  implies  a  higher  degree  of 
order  in  the  turbulent  flow.  Here,  the  role  of  the  con- 
trol parameter  is  played  by  the  pressure  difference  at  the 
ends  of  the  pipe.  At  its  zero  value,  the  fluid  is  in  an  equi- 
librium state  characterized  by  the  maximum  degree  of 
chaos.  This  is  another  important  example  of  a  physical 
system  in  which  the  equilibrium  state  is  taken  as  the  ref- 
erence point  for  the  degree  of  chaos.  One  more  example 
is  the  Van  der  Pol  generator. 

When  the  pressure  difference  is  other  than  zero,  all 
states  are  better  ordered.  This  adds  weight  to  the  argu- 
ment, in  accordance  with  what  is  said  in  Section  4.4,  that 
the  transition  from  laminar  to  turbulent  flow  is  an  exam- 
ple of  the  selforganization  process.  However,  this  does 
not  mean  that  the  degree  of  order  grows  monotonically 
with  increasing  Reynolds  number. 

A  higher  degree  of  organization  of  the  turbulent  motion 
compared  with  that  of  the  laminar  one  is  also  apparent 
as  demonstrated  below. 

The  momentum  transfer  between  layers  in  a  laminar 
flow  is  mediated  through  a  molecular  mechanism  which 
consists  in  independent  changes  of  momenta  of  individual 
gas  particles. 

Conversely,  in  the  case  of  a  turbulent  flow,  the  mo- 
mentum transfer  from  one  layer  to  another  is  a  collective 
process.  It  other  words,  individual  disorganized  motion 
in  a  laminar  flow  changes,  upon  transition  to  the  turbu- 
lent flow,  into  collective  (hence,  more  organized)  motion. 

This  results  in  the  turbulent  viscosity  coefficient  be- 
ing much  higher  than  the  corresponding  parameter  for  a 
laminar  flow. 

A  higher  degree  of  order  in  turbulent  motion  is  also 
confirmed  by  calculations  of  entropy  production. 

F.  Estimation  of  the  relative  degree  of  order  from 
experimental  data 

Practical  application  of  the  S-theorem  implies  that  the 
effective  Hamilton  function  is  known.  It  is  easy  to  find 
provided  a  mathematical  model  of  the  process  in  question 
is  available.  In  many  cases,  however,  there  are  no  ade- 
quate mathematical  models  for  open  physical  systems. 
This  problem  is  even  more  complicated  as  far  as  biologi- 
cal, social,  and  economic  entities  are  concerned. 

Therefore,  it  is  sometimes  necessary  to  be  able  to  de- 
termine the  relative  degree  of  order  in  open  systems  di- 
rectly from  experimental  data.  This  can  be  achieved  in 
the  following  way: 

1.  By  selecting  control  parameters  for  a  given  system, 
e.g.  two  states  of  the  system  with  control  parameters  ao 
and  ao  +  Aa. 

2.  By  experimentally  obtaining  sufficiently  long  tem- 
poral realizations  for  the  chosen  values  of  the  governing 
para-  meters. 

X0(t,a0),        X(t,a0  +  Aa).  (4.10) 
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These  data  are  loaded  into  a  computer  and  used  to  con- 
struct the  corresponding  distribution  functions: 

fo(X,a0),        f(X,a0  +  Aa).  (4.11) 

The  two  distributions  are  normalized  to  unity. 
Further  operations  are  as  above. 

G.  Diagnosis  of  medico-biological  objects  [3,12] 

Let  us  consider  some  applications  of  the  S-theorem  for 
the  purpose  of  medico-biological  diagnostics.  Investiga- 
tions into  this  problem  were  initiated  in  Kiev  and  Moscow 
in  1990,  using  both  mathematical  models  and  experimen- 
tal data.  In  1994,  the  first  results  of  the  analysis  of  car- 
diograms based  on  the  S-theorem  were  obtained  by  the 
joint  efforts  of  biologists  and  clinicians  in  the  Laborato- 
ries of  Nonlinear  Dynamics  at  the  Saratov  and  Potsdam 
Universities. 

Biological  experiments  reported  by  T.  G.  Anishchenko 
from  Saratov  university  revealed  significant  differences  in 
the  responsiveness  of  male  and  female  rats  to  the  noise 
stress.  Biochemical  studies  have  demonstrated  opposite 
changes  in  the  behaviour  of  the  two  sexes.  This  finding 
provided  the  basis  for  a  study  of  men  and  women's  be- 
haviour in  response  to  stress.  The  evaluation  was  also 
made  using  the  S-theorem. 

Two  cardiograms  were  obtained  from  each  subject  in- 
cluded in  the  study,  one  before  and  the  other  after  iden- 
tical stress  impact  (a  shrilly  acoustic  signal). 

Two  cardiograms  being  available  from  each  subject, 
this  allowed  a  change  in  the  relative  degree  of  order  to  be 
individually  estimated  using  the  S-theorem.  The  experi- 
ment has  demonstrated  opposite  changes  in  the  degree  of 
order  in  men  and  women,  the  former  showing  a  decreased 
degree  of  chaos,  while  in  the  latter  it  increased. 

In  both  cases,  there  was  a  deviation  from  the  "norm 
of  chaos"  suggesting  "pathology".  It  is  for  physicians  to 
decide  which  "disease"  is  more  dangerous. 

The  return  to  the  "norm  of  chaos"  may  be  sponta- 
neous. Then,  the  "recovery"  occurs  unaided,  with  time 
serving  as  the  control  parameter. 

If  the  patient's  conditions  are  normalized  by  drug  ther- 
apy, its  efficacy  is  possible  to  evaluate  using  the  same 
criterion. 

Naturally,  each  doctor  has  his  (her)  own  criteria  un- 
related to  the  S-theorem.  However,  it  may  be  equally 
useful  to  take  advantage  of  the  additional  objective  in- 
formation derived  from  the  analysis  of  the  relative  degree 
of  order  as  described  above. 

V.  WHAT  IS  SELF-ORGANIZATION?  [12] 

Two  classes  of  systems  were  outlined  in  a  previous  Sec- 
tion. 


One  of  them  includes  many  physical  systems  exempli- 
fied in  the  foregoing  discussion  by  two  cases.  To  begin 
with,  it  is  a  Van  der  Pol  generator  in  which  losses  (of 
electrical  resistance)  are  first  compensated  as  the  feed- 
back parameter  grows  while  its  further  rise  results  in  the 
transition  to  the  developed  generation  region.  According 
to  the  S-theorem,  this  is  a  case  of  selforganization.  This 
process  starts  from  equilibrium,  that  is,  thermal  fluctua- 
tions in  an  electrical  contour  in  the  absence  of  feedback. 
This  leads  to  the  conclusion  that  the  process  of  selfor- 
ganization may  be  defined  as  the  transition  from  a  most 
chaotic  (equilibrium)  state  to  a  more  ordered  one  (gen- 
eration). 

The  situation  is  similar  in  the  transition  from  lami- 
nar to  turbulent  flow  in  a  pipe  with  increasing  pressure 
difference  (a  higher  Reynolds  number). 

Here,  the  reference  point  for  the  degree  of  chaos  is  also 
the  equilibrium  state  of  a  fluid  in  the  absence  of  pressure 
difference,  that  is  at  the  zero  control  parameter.  In  this 
case,  hydrodynamic  motion  is  lacking  and  only  chaotic 
motion  of  molecules  occurs.  Evidently,  this  state  is  most 
chaotic. 

Again,  the  process  of  self-organization  is  the  transi- 
tion from  a  more  chaotic  to  a  less  chaotic  state.  Is  this 
the  universal  definition  of  self-organization?  It  can  be 
inferred  from  the  previous  section  that  the  process  of 
self-organization  is  not  necessarily  associated  with  an  in- 
crease in  the  degree  of  order. 

Indeed,  there  is  a  broad  class  of  systems  (in  the  first 
place,  biological  systems)  for  which  neither  the  state  of 
complete  chaos  (thermodynamic  equilibrium)  nor  that  of 
ideal  order  can  be  realized.  Biological  systems  would  not 
function  under  such  conditions. 

A  more  fundamental  notion  for  such  systems  is  the 
"norm  of  chaos"  which  has  been  used  more  than  once  in 
the  previous  discussion.  This  notion  is  compatible  with 
that  of  "health".  Then,  selforganization  is  the  process  of 
reconvalescence. 

Now,  let  us  turn  back  to  the  studies  on  the  respon- 
siveness of  men  and  women  to  stress.  Earlier,  we  have 
agreed  to  regard  post-stress  conditions  as  "pathology". 
This  means  that  the  transitions  to  the  "norm  of  chaos" 
in  women  is  actually  the  "recovery"  referred  to  above  as 
selforganization,  i.e.  the  transition  from  a  more  chaotic 
to  less  chaotic  state. 

Conversely,  the  stress-induced  state  for  men  is  "illness" 
which  corresponds  to  a  more  ordered  state. 

Hence,  the  "recovery" (self-organization)for  men  is  the 
transition  from  an  ordered  state  to  a  more  chaotic  one. 

Thus,  the  concepts  of  self-organization  and  degrada- 
tion in  biological  systems  cannot  be  unequivocally  related 
to  an  enhanced  (self-organization)  or  impaired  (degrada- 
tion) degree  of  order  respectively. 

A  more  fundamental  notion  for  such  systems  is  the 
"norm  of  chaos"  which  can  be  estimated  from  empirical 
data  using  the  'S-theorem'. 

To  summarize,  it  appears  from  the  above  analysis  that 
in  certain  cases  selforganization  is  easy  to  observe,  e.g. 
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the  generation  developing  in  a  Van  der  Pol  system  with 
an  increasing  feedback  parameter.  Other  well-known  ex- 
amples are  the  appearance  of  a  new  structure  (Benar 
cells)  at  the  liquid  surface  heated  from  below  and  Tay- 
lor vortices  between  rotating  coaxial  cylinders.  Using  the 
most  fortunate  term  "dissipative  structures"  coined  by  I 
Prigogine,  the  selforganization  process  may  be  described 
as  the  spontaneous  occurrence  of  structures  in  nonlinear 
dissipative  open  systems,  e.g.  temporal  dissipative  struc- 
tures in  the  Van  der  Pol  generator  and  spatial  dissipative 
structures  exemplified  by  the  Benar  cells  and  Taylor  vor- 
tices. Elimination  of  the  control  parameter  (feedback, 
temperature  gradient,  etc.)  in  all  these  cases  results  in  a 
"system  at  rest",  i.e.  one  in  the  state  of  thermodynamic 
equilibrium. 

Such  understanding  of  the  term  "self-organization"  un- 
derlies the  theory  of  formation  of  dissipative  structures. 
The  first  systematic  exposition  of  this  range  of  problems 
has  been  given  in  the  well-known  book  of  I  Prigogine  and 
G.  Nicolis.  The  starting  point  was  Prigogine's  ideas  on 
thermodynamics  of  irreversible  nonequilibrium  processes. 

Haken's  theory  of  self-organization  is  based  on  the  ap- 
pearance of  structures  due  to  collective  interactions.  In 
other  words,  cooperative  processes  are  posited  as  being 
of  primary  importance.  This  prompted  H.  Haken  to  use 
the  term  "synergetics"  for  this  new  interdisciplinary  field 
of  research. 

In  more  complicated  cases  such  as  transition  from  one 
turbulent  motion  to  another,  in  biological  systems,  it  is 
possible  to  distinguish  between  the  processes  of  degra- 
dation and  self-organization  based  on  the  criterion  "S- 
theorem" .  In  such  cases,  the  understanding  of  self- 
organization  as  the  appearance  of  new  structures  or  the 
transition  from  less  to  more  ordered  states  becomes  in- 
sufficient. 

This  inference  is  valid  for  all  systems  in  which  the 
equilibrium  state  can  not  serve  as  the  reference  point 
for  the  relative  degree  of  chaos  (or  order).  Here,  the 
"norm  of  chaos"  concept  is  of  greater  importance  and,  in 
the  general  case,  certainly  applies  to  the  nonequilibrium 
state,  with  the  transition  from  "pathology"  to  "health" 
corresponding  to  self-organization.  Since  deviation  from 
the  norm  is  possible  in  two  directions  (towards  a  greater 
or  smaller  degree  of  chaos),  the  self-organization  process 
may  in  the  general  case  also  proceed  in  two  directions. 

Therefore,  the  traditional  definition  of 
self-organization  as  the  spontaneous  formation  of  struc- 
tures in  dynamic  nonlinear  dissipative  open  systems  is 
too  "narrow".  A  more  comprehensive  description  of  self- 
organization  processes,  even  their  mere  identification,  is 
feasible  by  the  methods  of  the  statistical  theory  of  open 
systems. 

The  term  "self-organization"  is  actually  rooted  deep 
in  ancient  thought.  This  is  a  very  interesting  question 
worthy  of  illustration  by  the  following  facts. 

In  1966,  the  book  on  "Principles  of  Self-Organization" 
was  published  in  the  Russian  language.  It  is  a  collection 
of  reports  delivered  to  a  Symposium  at  the  Illinois  State 


University,  USA,  in  1961.  Here  is  a  quotation  from  the 
Preface  to  the  Russian  edition  by  A  Lerner,  the  editor: 

"Despite  the  marked  prevalence  of  self-organizing  sys- 
tems and  persistent  attempts  of  scientists  to  under- 
stand the  phenomena  occurring  in  such  systems,  self- 
organization  has  in  a  way  remained  for  many  centuries 
perhaps  the  most  mysterious  phenomenon,  the  most  in- 
timate of  nature's  secrets" .  The  Preface  goes  on  to  state: 
"...  the  reader  will  hardly  find  here  a  report  which  would 
not  claim  to  disclose  the  mystery  of  selforganization" . 

Heinz  von  Foerster,  the  editor  of  the  American  pub- 
lication, writes  in  the  Introduction  with  reference  to  a 
story  by  Plato,  a  famous  Greek  philosopher: 

"The  house  of  Agathon  was  the  place  where  the  first 
memorable  symposium  was  held  on  the  problems  ly- 
ing at  the  junction  of  different  sciences,  attended  by 
philosophers,  statesmen,  dramatists,  poets,  sociologists, 
linguists,  doctors  and  students  learning  various  trades". 

The  report  by  Y.  Eshby,  a  known  expert  in  the  field, 
contains  a  statement  to  the  effect  that  the  word  "self- 
organization"  can  also  mean  "transition  ^from  bad  to 
good  organization",  even  though  the  author  does  not 
explain  how  to  distinguish  between  "bad"  and  "good". 
An  approach  to  this  problem  is  illustrated  by  the  above- 
mentioned  analysis  of  cardiograms  which  allowed  to  dif- 
ferentiate between  "health"  and  "pathology".  Such  a  dis- 
tinction is  also  possible  based  on  the  above  criterion  for 
the  relative  degree  of  chaos  in  different  states  of  open 
systems. 

Naturally,  there  are  more  diagnostic  criteria  to  evalu- 
ate the  state  of  biological  systems.  However,  the  com- 
parison of  different  diagnostic  tools  is  a  matter  which 
requires  special  attention. 

VI.  PHYSICS  OF  OPEN  SYSTEMS  FOR 
SOCIOLOGISTS  AND  ECONOMISTS 

H.  Haken  has  reported  one  of  the  earliest  applications 
of  synergetics  to  sociology.  The  expedience  of  applying 
synergetics  for  this  purpose  is  due  to  the  important  role 
of  collective  effects  in  social  processes.  Specifically,  they 
are  to  a  great  extent  involved  in  shaping  public  opinion 
even  though  separate  acts  of  choice  are,  by  necessity,  indi- 
vidual. A  model  survey  of  social  systems  was  carried  out 
by  the  group  of  W.  Weidlich  who  suggested  simple  mod- 
els for  the  description  of  the  formation  of  public  opinion, 
population  migrations,  and  urbanization. 

At  present,  methods  of  synergetics  are  extensively  em- 
ployed to  simulate  economic  processes.  Economics  is  an 
ancient  science  with  deeply  rooted  traditions  and  ad- 
vanced methods  of  qualitative  and  quantitative  descrip- 
tion of  various  processes.  Nevertheless,  there  is  a  wealth 
of  unresolved  problems  challenging  the  physics  of  open 
systems  to  apply  its  methodology  in  this  field. 

A  substantial  part  of  these  problems  is  related  to  the 
optimization  of  the  relationships  between  production, 
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distribution,  and  consumption  based  on  the  criterion  for 
the  relative  degree  of  order  in  open  (social  or  economic) 
systems.  Such  an  approach  may  hopefully  provide  addi- 
tional information  necessary  to  monitor  the  efficiency  of 
the  assumed  control  parameters,  to  estimate  the  'norm 
of  chaos',  and  to  "treat  the  disease",  that  is  a  deviation 
from  the  'norm  of  chaos'  on  either  side. 

In  case  of  spontaneous  "recovery",  i.e.  without  in- 
terference from  the  outside  ("medication"),  "reconvales- 
cence"  may  also  be  regarded  as  a  self-organization  pro- 
cess. Certainly,  similar  to  the  situation  with  biological 
systems,  the  equilibrium  state  cannot  serve  as  the  refer- 
ence point  in  estimating  the  relative  degree  of  chaos  in 
social  and  economic  systems.  Only  the  state  correspond- 
ing to  the  "norm  of  chaos"  may  be  used  for  this  purpose. 
Identification  of  such  a  state  is  the  principal  task  which 
can  be  accomplished  using  criteria  for  the  relative  degree 
of  order  in  physics  of  open  systems. 

A  few  years  ago,  G.  Cagliotti  [15],  an  Italian  investiga- 
tor, published  a  popular-science  book  under  the  title  of 
Dynamics  Ambiguity.  One  of  the  first  pages  in  the  book 
reads  as  follows:  "A  study  of  perception  may  reveal  in- 
tegrating factors.  That  is,  originally  disordered  sensory 
stimuli  become  correlated  and  organized  in  the  brain  into 
ordered  coherent  structures  which  are  then  converted  to 
a  thought" .  In  other  words,  the  transition  from  percep- 
tion to  idea  is  the  transition  from  a  less  to  more  ordered 
state  of  the  brain. 

True,  this  is  a  very  beautiful  picture  of  the  birth  of  an 
idea.  The  question  is  how  close  it  is  to  reality.  The  book 
gives  no  answer  since  it  does  not  consider  the  criteria  for 
a  relative  degree  of  order  in  open  systems  which  would 
allow  for  the  distinction  between  'order'  and  'chaos'. 
Doubtless,  some  information  about  the  modulation  of  or- 
derliness accompanying  the  generation  of  an  idea  can  be 
obtained  from  the  analysis  of  brain  activity  using  en- 
cephalograms and  the  above  criteria  from  the  physics  of 
open  systems,  specifically  the  5-theorem. 

Such  an  approach  implies  a  series  of  experimental  stud- 
ies designed  to  elucidate  "the  thought  production  rate", 
its  difference  in  men  and  women,  the  influence  on  artis- 
tic performance,  etc.  Naturally,  joint  efforts  of  specialists 
representing  different  scientific  disciplines  are  necessary 
to  solve  such  a  difficult  problem. 

Finally,  all  this  his  brings  to  mind  the  famous  book  of 
Erwin  Schrodinger  What  is  Life?  published  in  English  in 
1944  and  in  Russian  in  1947.  We  shall  refer  to  only  one 
fragment  directly  related  to  the  present  discussion. 

Very  interesting  statement  which  Schrodinger  makes 
in  the  final  chapter  of  his  book  is  worth  citing: 

"An  organism's  astonishing  gift  of  concentrating  "a 
stream  of  order"  on  itself,  and  thus  escaping  the  decay 
into  atomic  chaos  of  "drinking  orderliness"  from  a  suit- 
able environment  seems  to  be  connected  with  the  pres- 
ence of  the  "aperiodic  solid" ,  the  chromosome  molecules, 
which  doubtless  represent  the  highest  degree  of  well- 
ordered  atomic  association  we  know  of  -  much  higher 
than  the  ordinary  periodic  crystal...". 


This  is  truly  a  remarkable  thought,  but  it  cannot  be 
considered  here  at  greater  length.  Suffice  it  to  answer  the 
following  question:  "Is  the  degree  of  order  in  an  aperiodic 
crystal  higher  than  in  an  usual  periodic  one,  in  terms  of 
the  above  theory?"  There  is  every  reason  to  argue  that 
the  answer  must  be  in  the  affirmative! 

Indeed,  there  is  an  analogy  with  the  relative  degree  of 
order  for  laminar  and  turbulent  flows.  It  seems  natural  to 
identify  a  laminar  flow  with  a  periodic  crystal  and  a  tur- 
bulent one  with  an  aperiodic  crystal.  The  thermal  atomic 
motion  in  periodic  crystals  may  be  assumed  to  represent 
the  state  of  physical  chaos.  Hence,  collective  degrees  of 
freedom  in  aperiodic  crystals  are  of  greater  importance 
than  in  periodic  ones.  This  suggests,  in  conformity  with 
the  S-theorem,  a  higher  degree  of  order  in  an  aperiodic 
crystal  than  in  a  periodic  one.  This  inference,  however, 
remains  to  be  quantitatively  confirmed. 


VII.  KINETIC  DESCRIPTION  OF  THE  SECOND 
ORDER  PHASE  TRANSITIONS 

In  this  part  the  kinetic  method  of  physics  of  open  sys- 
tem for  description  of  phase  transitions  in  ferroelectrics 
is  considered  [16]. 


A.  Kinetic  equation 

Let  us  X(R,t)  is  the  local  value  of  a  relative  displace- 
ment of  crystal  lattices  of  a  ferroelectrics.  In  the  Landau 
theory  for  the  nonsymmetrical  phase  it  shall  be  play  the 
role  of  the  local  order  parameter. 

Following  the  Landau  theory  [17-20]  we  introduce  the 
effective  Hamilton  function.  For  the  ferroelectrics  model 
the  effective  Hamilton  function  hef  j  for  the  one  particle 
is  defined  by  expression  [21,16] 


heff  = 


mu 


(l-as)+l-bX'< 


(7.1) 


m  is  the  mass  of  atom,  and  ttvJq  is  the  hardness,  and 
Wo  is  the  proper  frequency  of  the  atoms  vibration.  The 
parameter  a/(T)  defines  the  influence  of  the  Lorentz  ef- 
fective field  on  the  hardness.  This  parameter  depends 
on  the  temperature.  In  the  region  of  the  critical  point, 
following  the  Landau  Theory  this  dependence  is  defined 
by  the  expression 


1  -  af 


T-Tc 

Tr 


(7.2) 


Here  Tc  is  the  critical  temperature. 

To  obtain  a  qualitative  representation  about  the  be- 
havior of  the  system  for  all  temperature  it  is  possible  to 
represent  the  function  aj(T)  by  the  following  expression: 


1  —  a.f  =  tanh 


T-Tc 
AT 


(7.3) 
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Here  AT  is  the  "temperature  width"  of  the  transition 
region.. The  parameter  6  characterizes  the  role  of  nonlin- 
earity. 

For  kinetic  description  of  the  second  order  phase  tran- 
sitions we  shall  be  to  use  the  kinetic  equation  for  a  distri- 
bution function  f(X,R,t)  not  only  internal  variable  X, 
but  also  and  coordinates  R  of  bistable  elements  [3,4,16]: 


df{X,  R,t)  d_ 

dt  ~dX 


df        1  dheff(X,af) 

D^dx  +  7^     dx  f 


+  D 


d2f 
(7.4) 


It  contains  two  dissipative  terms,  which  are  determined 
by  redistribution  of  bistable  elements  in  space  R  with  the 
diffusion  coefficient  D,  and  redistribution  in  the  space 
X  with  the  diffusion  coefficient  D(x),  accordingly..  For 
simplicity  we  assume  lower  that  two  diffusion  coefficients 
are  equal.  The  equilibrium  solution  of  the  kinetic  equa- 
tion for  the  homogeneous  distribution  coincides  with  the 
Boltzman  distribution 

fB{x,af,T)  =  exp  J-  — U  J— .  (7.5) 

The  Hamilton  function  /ie//  depends  on  temperatures 
and,  as  consequence,  the  additional  thermodynamic 
"thermal  force"  appears: 

Ftk(T)  =  -j  dHeffd{TX'T)f(X,af,T)dX.  (7.6) 

This  leads  to  the  change  of  the  thermodynamic  relations. 
In  particular,  the  known  Helmholtz  relation  has  now  the 
form 


dF(T) 
dT 


=  -SefJ(T)-Fth(T). 


(7.7) 


Here  F(T)  is  the  Gibbs  free  energy  ,  the  effective  en- 
tropy Sef  /  (T)  for  all  system  is  defined  by  the  Boltzmann 
distribution  fs {X,  a,f,T). 


B.  Thermodynamic  functions  in  Landau  theory 

In  the  Landau  theory  thermodynamic  functions  are  de- 
fined through  the  most  probable  values  Xmp,  for  the  dis- 
tribution function 

Xm.p.  =  0    a    t  >  0;    Xm.p,  =  ±\f\r\b    a    r  <  0. 

(7.8) 

In  the  critical  point  there  is  a  place  the  jump  of  heat 
capacity: 


kBT 


moj 


2  • 


(7.9) 


We  see,  that  in  the  Landau  theory  the  peak  of  the  func- 
tion C(T)  is  absent.  For  the  Boltzmann  distribution  the 


heat  capacity  is  defined  not  by  the  Boltzmann  entropy, 
but  by  the  thermal  force.  Thus  the  Helmholtz  relation  is 
replaced  by  the  following  expression: 


dF(xmp.,T) 
dT 


=  -  Fth{xm.p.,T),    Sgff(xm.p  ,T)  -  0. 

(7.10) 


The  Boltzmann  entropy  on  this  level  on  the  description 
is  not  taking  into  account. 


C.  The  first-moment  approximation.  Monodomain 

state 

On  a  basis  of  the  kinetic  equation  we  can  obtain 
the  chain  of  equations  for  moments  (Xn).  In  the  self- 
consistent  approximation,  when  {Xn)  =  {X)n ,  we  receive 
closed  equation  for  the  first  moment  (X) .  In  the  approx- 
imation the  distribution  function  f{X,  R,t)  is  possible  to 
represent  in  the  form: 

f(X,R,t,)  =  S(X-X(R,t,)),    (X)  =  X(R,t)  (7.11) 

The  kinetic  equation  is  reduced  in  this  case  to  the 
reaction-diffusion  equation  for  first  moment  -  the  func- 
tion X(R,t)  [20,22]: 


dX{R,t) 
dt 


Il+bX2(R,t) 


dR? 

(7.12) 

It  differs  from  the  corresponding  Ginsburg-Landau  equa- 
tion by  the  structure  of  dissipative  terms. 

With  the  help  of  this  equation  it  is  possible  to  de- 
scribe, for  example,  the  structure  of  domain  walls  in  fer- 
roelectrics  In  the  stationary  one-dimensional  state  (R  is 
parallel  y)  the  solution  is  defined  by  the  expression: 


X(y)  = 


Tc-T 


tanh 


y 


Tcb   I  d' 
The  width  of  a  wall  is  defined  by  the  formula: 


(7.13) 


d  =  XT  2 


Tr'T 


kT 


(7.14) 


The  ratio  of  the  thickness  of  the  wall  d  to  the  "amplitude" 
at  X(y  —  — oo)  is  defined  by  the  expression 


d 


X(y  =  -oo) 


=  {2X$bf'2 


Tc-T 


(7.15) 


At  approach  to  a  critical  point  this  ratio  increase  by  the 
Curie  law.  In  the  critical  point  this  solution  loses  sense. 
On  the  basis  of  the  kinetic  we  can  evaluate  this  ratio  and 
in  a  critical  point. 
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In  order  to  describe  kinetic  fluctuations,  we  shall  use 
the  kinetic  equation  with  the  Langevin  source.  The  in- 
tensity of  a  noise  is  defined  by  a  sum  of  "reaction"  and 
diffusion  terms. 

By  solving  this  linearized  equation  (7.12)for  region  T  > 
Tc,  we  get  the  following  expression  for  space-time  spectral 
density  of  fluctuations  SX(R,  t)  : 

m'^JX^->m-  (716) 

We  used  here  the  designations  for  the  "reaction"  dissipa- 
tive  coefficient  7^)  at  T  >  Tc  and  for  dispersion  {(8X)2) 
for  the  Boltzmann  distribution  in  Gaussian  approach 


7(») 


7 


Tc 


mu/j  T  —  Tc 


—  Xt 


T-Tc 


(7.17) 

In  the  expression  (7.16)  on  the  equal  rights  are  taking 
into  account  both  dissipative  terms.  In  the  consequence 
of  this  the  spatial  density  of  fluctuations  and  the  spatial 
correlation  functions  are  defined  by  formulas: 

(5X5X)k  =  i  ((8X)2)  ;  {5X8X)R>RI  =  ±  {{SX)2)  6(R  -  R'). 

(7.18) 

Thus  the  spatial  correlation  function  is  now  not  defined, 
as  in  the  Landau  theory,  by  the  Ornstein-Zernike  (0- 
Z)  formula.  The  dispersion  (SX8X) R=RI  differs  from 
zero  only  in  a  volume  of  a  point  of  continuous  medium 
(in  physically  infinitesimal  volume  Vph),  as  the  function 
6(R—  R')  \jt-fti=  V~hx.  For  one-point  correlator  ((8x)2) 
we  have,  in  result,  the  expression: 


1 


{8X8X)R=RI  =  —  ((8X)2) 


ph 


(7.19) 


Here  Nph  =  nVph  is  a  number  of  particles  in  a  physically 
infinitesimal  volume.  Thus  the  dispersion  of  fluctuations 
are  smoothed  on  a  volume  of  "point"  is  in  Nph  times  less, 
than  for  the  Boltzmann  distribution. 

The  0-Z  formula  is  connected  now,  not  with  the  in- 
tegral for  the  spectral  line  on  frequency  ui,  but  with  a 
spectral  line  for  zero  frequency  u>  =  0.  After  the  integra- 
tion over  the  wave  number  we  get: 

1  ,    r  „     2  Tc 


(8X8X)u=0,r7(x)  =  -{(8X)2) 


1      '   \       2  v2 

exp(  );  rc  =  XT- 


VIII.  APPROXIMATION  OF  THE  SECOND 
MOMENT.  POLYDOMAIN  FERROELECTRICS 

Let  us  the  system  has  the  polydomain  structure  and 
the  first  moment  (X(R,t))  —  0.  In  this  case  it  is  more 
natural  to  use  the  self-consisting  approximation  for  the 
second  moments  (X2m)  --  (X2)n  =  (E)  which  leads  to 
the  reaction-diffusion  equation,  but  now  for  the  function 
(X2)  =  (E)  =  E(R,t)  we  have  the  following  equation: 


T-Tc 
(7.20) 

We  see  that  the  meaning  of  the  formula  O-Z  in  theory 
of  phase  transitions  is  changed.  ' 

We  can  now  to  enter  new  correlation  parameter  K, 
which  is  analog  to  the  Ginsburg  parameter  Gi: 

K  ^   r  -  (  ^^rrr  1  '  .  (7.21) 


X2 

m.p 


nr?  \\T-TC\ 


In  the  kinetic  theory  it  is  possible  to  give  the  definition 
for  the  correlation  parameter  for  all  values  of  temperature 
(see  below). 


dE(R,t), 
dt  ' 


d(x)  -  r  (^YT  +  bE{R'  °)  E(R'  t]\  +  D 


d2E{R,t) 
dR? 

(8.1) 


The  stationary  solution  for  the  homogeneous  state  is 
determined  by  the  equation: 


(X2)  + 


~        /  y2\  _  XT 


E(R,t)  =  (X2).  (8.2) 


For  tree  selected  states: 


X 


2  Tr. 


TT-TC 


(X2)  = 


x^ 


Tr-T 

Tcb 


(x?f) 


1/2 


T>  Tc 
T  =  TC 
T<TC 


(8.3) 


We  see  that  the  second  moment  plays  for  the  nonsym- 
metrical phase  the  role  of  the  order  parameter.  In  the 
critical  point  it  has  a  finite  value.  Above  the  critical 
point  the  function  (X2)  coincides  with  the  dispersion  for 
the  Boltzmann  distribution. 

For  the  calculation  of  fluctuations  for  the  approxima- 
tion is  considered  in  the  kinetic  we  can  carried  out  the 
replacement: 


1  dhejj{X,a}) 
m-y  dX 


X 


T-Tc 


+  b(X2fj  (8.4) 


As  a  result  the  kinetic  equation  has  the  form: 
df     ^(T  -Tc 


df_  _d_ 

dt  +  dX 


d(x)  gr  +  r       ^  +  bE(R,  0)  Xf 


+  D 


d2f 

dR2  ' 

(8.5) 


Together  with  the  equation  (8.2)  it  consists  the  closed 
system  of  equations  for  function  f(X,R,t)  and  average 
energy  E(R,t). 

We  shall  see  that  there  are  two  type  of  fluctuations: 
"fast"  and  "  slow  "  .  The  fluctuations  of  energy  are  fast 
ones.  To  describe  its  we  can  use  the  closed  equation 
(8.1)  for  energy's  fluctuations.  The  slow  fluctuations  are 
described  by  the  last  equation  with  mean  energy  is  de- 
termined by  stationary  equation  (8.2).  In  this  approxi- 
mation the  kinetic  equation  (8.5)  can  be  represented  in 
the  form: 


df 
dt 


_d_ 
dX 


D 


(x) 


df  D 


+ 


ox  (x2) 


xf 


+  D 


d2f 
dR2 


(8.6) 
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The  equilibrium  solution  is  the  Gaussian  distribution 

X2 


f(X)  =  (2n(E))-1/2exp  (- 


(8.7) 


2(X2) 

Now  we  can  proceed  to  the  calculation  of  fluctuations 


IX.  FAST  AND  SLOW  FLUCTUATIONS  AT 
PHASE  TRANSITIONS 

The  equation  for  the  Fourier  components  of  energy 
fluctuation  5E(R,t)  is: 


at 


T{E) 


(9.1) 


The  relaxation  time  is  defined  by  the  following  formula 
for  all  temperatures 

1    =  A{E)(k)=2T(^^  +  2b(X2))+Dk2  (9.2) 


T(E) 


The  average  energy  (A'2)  =  (E)  is  defined  by  the  solution 
of  the  equation  (8.2). 

For  calculation  of  a  slow  fluctuations  we  shall  use 
the  kinetic  equation  (8.6)  with  an  appropriate  Langevin 
source.  In  result,  the  equation  for  a  Fourier  components 
of  the  function  X(R,  t)  has  the  form: 


— iu  + 


1 


T(X) 


X{u,k)  -  y(x)(w,A;) 


(9.3) 


with  the  appropriate  width  of  a  spectral  line: 


  =  A  X  -  —j- 

T(X)  (Xs) 


+  Dk2  = 


D 


(l+r2ck2).  (9.4) 


(X2) 


The  appropriate  response  on  the  Langevin  source  y[ui,  k) 
is  defined  by  the  expression: 


and  therefore 


1 


-^  +  ^(1+^2)' 


(9.5) 


X(x)W  =  0,k  =  0)  = 


(X2)  _  1  (X2) 


D 


r  x2 


(9.6) 


All  received  characteristics  can  be  determined  in  a  critical 
region  for  all  values  of  temperature  including  and  the 
critical  point..  Their  behavior  in  a  symmetrical  phase 
qualitatively  same,  as  and  in  the  Landau  theory.  In  the 
nonsymmetrical  phase,  however,  its  are  essentially  other 
from  known  ones. 

The  susceptibility  increase  on  the  Curie  law  only  at 
approach  to  the  critical  point  from  the  side  of  a  symmet- 
rical phase.  At  approach  to  critical  point  from  the  side 
of  low  temperatures  the  susceptibility  decrease.  It  give 


the  reason  to  speak  about  of  existence  of  the  "jump  for 
susceptibility" : 

X(x)\tc-t>at  -  X(x)\t-tc>at  =  p y2~7,  ^  f  •  (9-7) 

Remark,  that  this  formula  is  like  to  the  one  for  the  jump 
of  the  heat  capacity  in  the  Landau  theory.  Similar  be- 
havior of  a  dielectric  susceptibility  is  observed  in  some 
kind  of  ferroelectrics.  Exists  also  and  the  "jump  of  a 
relaxation  time" . 

At  last,  the  correlation  radius  is  defined  by  the  expres- 
sion 


r2 


D(X2) 

r  x2 


Dt, 


(X) 


(X2) 


(9.8) 


^From  it  follows  that  exists  and  the  "  jump  of  square  of 
a  correlation  radius" . 

So,  the  square  of  r2c  >  X\  for  all  T  <  Tc  and,  as 
consequence,  for  T  <  Tc  the  spatial  coherence  exists. 

For  slow  fluctuations  the  Ornstein-Zernike  formula  also 
take  place  only  for  the  temporal  correlator  on  zero  fre- 
quency. This  means,  that  in  nonsymmetrical  phase  the 
interaction  at  temperature  T  <  Tc  is  collective.  This  re- 
sult allows  to  enter  the  finite  dimensionless  correlation 
parameter  for  all  values  of  temperature. 

The  heat  capacity  at  the  phase  transition  is  defined  by 
the  formula 


C(T)  =  ^sN^r 


and  in  the  critical  point 


C(T  =  Tc)  =  \kBN-  ' 


X2b 


(9.9) 


(9.10) 


We  see,  that  now  the  jump  of  the  heat  capacity  is  equal  to 
zero  and  the  function  C(T)  is  the  symmetrical  function 
of  temperature  with  a  finite  value  in  the  critical  point. 

Thus,  for  polydomain  systems  the  jump  has  a  place 
not  for  the  heat  capacity,  but  for  a  susceptibility  and  for 
characteristics  is  connected  with  it.  The  similar  behavior 
was  pointed  characteristics  observe  on  experiments. 


X.  SHORT  CONCLUSION 

We  represented  here  some  illustration  of  ideas,  meth- 
ods and  results  of  the  modern  statistical  theory  of  open 
systems  -  systems  capable  of  exchanging  matter,  energy 
and  information  with  the  surrounding  world.  No  doubt 
that  the  notions  "norm  of  chaos"  or  "norm  of  order"  were 
introduced  above  allow  to  differentiate  the  degradation 
and  self-organization  processes  in  Intelligent  Systems  and 
Semiotics  [23,24]. 
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ABSTRACT 

Celestial  bodies  such  as  galaxies,  stellar  clusters,  plane- 
tary systems,  etc.,  have  different  geometric  shapes  (e.g., 
galaxies  can  be  spiral  or  circular,  etc.).  Usually,  compli- 
cated physical  theories  are  used  to  explain  these  shapes; 
for  example,  several  dozen  different  theories  explain  why 
many  galaxies  are  of  spiral  shape.  Some  rare  shapes  are 
still  difficult  to  explain. 

It  turns  out  that  to  explain  these  "astroshapes" ,  we  do  not 
need  to  know  the  details  of  physical  equations:  practically 
all  the  shapes  of  celestial  bodies  can  be  explained  by  simple 
geometric  invariance  properties.  This  fact  explains,  e.g., 
why  so  many  different  physical  theories  lead  to  the  same 
spiral  galaxy  shapes. 

This  same  physical  idea  is  used  to  solve  a  different  problem: 
the  optimal  sensor  placement  for  non- destructive  testing  of 
aerospace  systems. 

KEYWORDS:  astrogeometry,  semiotics,  symmetry 
groups,  non- equilibrium  thermodynamics,  non-destructive 
testing,  aerospace  structures 

1.  SEMIOTIC  SHAPES 

IN  ASTRONOMY:  FORMULATION 

OF  THE  FIRST  PROBLEM 

From  the  computer  viewpoint,  an  astronomical  image  is  a 
set  of  pixels  of  different  brightness.  However,  astronomers 
traditionally  interpret  these  images  in  terms  of  certain  ge- 
ometric shapes,  usually,  described  in  semiotic  terms  (by 
words  and  symbols).  For  example,  they  talk  about  spiral 
or  elliptical  galaxies,  etc.  This  language  is  very  produc- 
tive, because  it  enables  astronomers  to  predict  new  results. 


However,  the  very  origin  of  these  shapes  remains  somewhat 
a  mystery. 

To  be  more  precise,  there  are  several  dozens  theories  that 
explain,  e.g.,  the  spiral  galaxy  shape  (see,  e.g.,  [2,9,10]), 
but  the  very  fact  that  there  are  so  many  different  theo- 
ries for  explaining  the  same  observations  probably  means 
that  the  physical  details  involved  in  these  theories  are  not 
needed,  and  these  shapes  can  follow  from  fundamental  prin- 
ciples. 

In  this  paper,  we  show  that  that  this  is  indeed  the  case:  we 
can  explain  these  shapes  by  using  the  fundamental  physical 
ideas  of  symmetry  and  non- equilibrium  thermodynamics. 

2.  MAIN  PHYSICAL  IDEA 

The  initial  state  of  the  Universe  was  highly  sym- 
metric. To  find  out  how  shapes  have  been  formed,  let 
us  start  from  the  beginning  of  the  Universe  (for  a  detailed 
physical  description,  see,  e.g.,  Zeldovich  and  Novikov  [13]). 
The  only  evidence  about  the  earliest  stages  of  the  Universe 
is  the  cosmic  3K  background  radiation.  This  radiation 
is  highly  homogeneous  and  isotropic;  this  means  that  ini- 
tially, the  distribution  of  matter  in  the  Universe  was  highly 
homogeneous  and  isotropic.  In  mathematical  terms,  the 
initial  distribution  of  matter  was  invariant  w.r.t.  arbitrary 
shifts  and  rotations. 

We  can  also  say  that  the  initial  distribution  was  invariant 
w.r.t.  dilations  if  in  addition  to  dilation  in  space  (i.e.,  to 
changing  the  unit  of  length),  we  accordingly  change  the 
unit  of  mass. 

In  the  following  text,  we  will  denote  the  corresponding 
transformation  group  (generated  by  arbitrary  shifts  x  — ► 
x  +  a,  rotations,  and  dilation  x  — ►  X  ■  x)  by  G. 
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Dynamic  equations  are  also  symmetric.  On  the  as- 
tronomical scale,  of  all  fundamental  forces  (strong,  weak, 
etc.)  only  two  forces  are  non-negligible:  gravity  and  elec- 
tromagnetism.  The  equations  that  describe  these  two 
forces  are  invariant  w.r.t.  arbitrary  shifts,  rotations,  and 
dilations  in  space.  In  other  words,  these  interactions  are 
invariant  w.r.t.  our  group  G. 

The  problem:  our  world  should  be  symmetric,  but 
it  is  not. 

•  The  initial  distribution  was  invariant  w.r.t.  G; 

•  the  evolution  equations  are  also  invariant; 

hence,  we  will  get  G— invariant  distribution  of  matter  for 
all  moments  of  time.  But  our  world  is  not  homogeneous. 
Why? 

Solution:    spontaneous  symmetry  violation.  The 

reason  why  do  not  see  this  homogeneous  distribution  is 
that  this  highly  symmetric  distribution  is  known  to  be  un- 
stable: If,  due  to  a  small  perturbation,  at  some  point  a 
in  space,  density  becomes  higher  than  in  the  neighboring 
points,  then  this  point  a  will  start  attracting  matter  from 
other  points.  As  a  result,  its  density  will  increase  even 
more,  while  the  density  of  the  surrounding  areas  will  de- 
crease. So,  arbitrarily  small  perturbations  cause  drastic 
changes  in  the  matter  distribution:  matter  concentrates 
in  some  areas,  and  shapes  are  formed.  In  physics,  such 
symmetry  violation  is  called  spontaneous. 

Non-equilibrium  thermodynamics  explains  why 
perturbations  usually  preserve  some  symmetry. 

What  kind  of  perturbations  are  possible?  In  principle,  it 
is  possible  to  have  a  perturbation  that  changes  the  initial 
highly  symmetric  state  into  a  state  with  no  symmetries  at 
all,  but  statistical  physics  teaches  us  that  it  is  much  more 
probable  to  have  a  gradual  symmetry  violation:  first,  some 
of  the  symmetries  are  violated,  while  some  still  remain; 
then,  some  other  symmetries  are  violated,  etc.  (Similarly, 
a  (highly  organized)  solid  body  normally  goes  through 
a  (somewhat  organized)  liquid  phase  before  it  reaches  a 
(completely  disorganized)  gas  phase.)  At  the  end,  we  get 
the  only  stable  shape:  rotating  ellipsoid. 

This  idea  leads  ot  an  explanation  of  all  possible 
astroshapes.  Before  we  reach  the  ultimate  ellipsoid  stage, 
perturbations  are  invariant  w.r.t.  some  subgroup  G'  of 
the  initial  group  G.  If  a  certain  perturbation  concentrates 
matter,  among  other  points,  at  some  point  a,  then,  due  to 
invariance,  for  every  transformation  j  G  G',we  will  observe 
a  similar  concentration  at  the  point  g(a).  Therefore,  the 
shape  of  the  resulting  concentration  contains,  with  every 
point  a,  the  entire  orbit  G'(a)  —  {g(a)\g  £  G'}  of  the 


group  G' .  Hence,  the  resulting  shape  consists  of  one  or 
several  orbits  of  a  group  G' . 

3.  THE  RESULT  OF 
PHYSICAL  ANALYSIS: 
DESCRIPTION  OF  ASTROSHAPES 

In  view  of  the  above  analysis,  to  describe  all  possible  shapes 
of  celestial  bodies,  it  is  sufficient  to  describe  all  possible  or- 
bits of  subgroups  G'  of  the  group  G  (=  all  shifts,  rotations, 
and  dilations).  In  this  paper,  we  will  show  that  this  de- 
scription really  describes  all  known  astroshapes.  (Some  of 
these  results  were  first  announced  in  [3-5,6-8].) 

A  word  of  warning:  geometric  shapes  are  only  ap- 
proximate. Objects  of  nature  can  only  approximately  be 
described  by  geometric  figures.  Correspondingly,  in  our 
physical  explanation,  perturbations  are  only  approximately 
invariant  w.r.t.  G'.  The  farther  away  from  the  point  a,  the 
less  similar  is  the  point  g(a)  to  the  point  a.  Therefore,  in 
reality,  we  may  observe  not  the  entire  orbit,  but  only  a 
part  of  it. 

Possible  orbits.  0—,  1  —  ,  and  2— dimensional  orbits  of 
continuous  subgroups  G'  of  the  group  G  are  easy  to  de- 
scribe: 

0:  The  only  0— dimensional  orbit  is  a  point. 

1:  A  generic  1— dimensional  orbit  is  a  conic  spiral  that 
is  described  (in  cylindrical  coordinates)  by  the  equa- 
tions z  =  kp  and  p  =  Roexp(c<p).  Its  limit  cases 
are: 

•  a  logarithmic  (Archimedean)  spiral:  a  planar 
curve  (z  =  0)  that  is  described  (in  polar  co- 
ordinates) by  the  equation  p  =  Roexp(cip). 

•  a  cylindrical  spiral,  that  is  described  (in  appro- 
priate coordinates)  by  the  equations  z  =  kip, 
p  -  Ro- 

•  a  circle  (z  =  0,  p  =  Ro); 

•  a  semi-line  (ray); 

•  a  straight  line. 

2:  Possible  2-D  orbits  include: 

•  a  plane; 

•  a  semi-plane; 

•  a  sphere; 

•  a  semi-plane; 

•  a  circular  cylinder,  and 

•  a  logarithmic  cylinder,  i.e.,  a  cylinder  based  on 
a  logarithmic  spiral. 
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Possible  orbits  are  exactly  possible  shapes.  Com- 
paring these  orbits  (and  ellipsoids,  the  ultimate  sta- 
ble shapes)  with  astroshapes  enumerated  in  Vorontsov- 
Veliaminov  [12],  we  conclude  that: 

•  First,  our  scheme  describes  all  observed  connected 
shapes. 

•  Second,  all  above  orbits,  except  the  logarithmic 
cylinder,  have  actually  been  observed  as  shapes  of 
celestial  bodies. 

For  example,  according  to  Chapter  III  of  Vorontsov- 
Veliaminov  [12],  galaxies  consist  of  components  of  the  fol- 
lowing geometric  shapes: 

•  bars  (cylinders); 

•  disks  (parts  of  the  plane); 

•  rings  (circles); 

•  arcs  (parts  of  circles  and  lines); 

•  radial  rays; 

•  logarithmic  spirals; 

•  spheres,  and 

•  ellipsoids. 

The  only  orbit-originated  shape  that  is  not  in  this  list  is 
logarithmic  spiral.  It  is  easy  to  explain  why  logarithmic 
cylinder  was  never  observed:  from  whatever  point  we  view 
it,  the  logarithmic  cylinder  blocks  all  the  sky,  so  it  does  not 
lead  to  any  visible  shape  in  the  sky  at  all.  With  this  expla- 
nation, we  can  conclude  that  we  have  a  perfect  explanation 
of  all  observed  astroshapes. 

Comment:  we  can  also  explain  dimcult-to-explain 
disconnected  shapes.  In  the  above  description,  we  only 
considered  connected  continuous  subgroups  G'  C  G.  Con- 
nected continuous  subgroups  explain  connected  shapes. 
It  is  natural  to  consider  disconnected  (in  particular,  dis- 
crete) subgroups  as  well;  the  orbits  of  these  subgroups 
leads  to  disconnected  shapes.  Thus,  we  can  explain  these 
shapes,  most  of  which  modern  astrophysics  finds  pathologi- 
cal and  difficult  to  explain  (see,  e.g.,  Vorontsov-Veliaminov 
[12],  Section  1.3). 

For  example,  an  orbit  O  of  a  discrete  subgroup  G"  of  the  1- 
D  group  G'  (whose  orbit  is  a  logarithmic  spiral)  consists  of 
points  whose  distances  rn  to  the  center  forms  a  geometric 
progression:  rn  —  ro-kn.  Such  dependence  (called  Titzius- 
Bode  law)  has  indeed  been  observed  (as  early  as  the  18th 
century)  for  planets  of  the  Solar  system  and  for  the  satel- 
lites of  the  planets  (this  law  actually  led  to  the  prediction 
and  discovery  of  what  is  now  called  asteroids).  Thus,  we 
get  a.  purely  geometric  explanation  of  the  Titzius-Bode  law. 


Less  known  examples  of  disconnected  shapes  that  can  be  i 
explained  in  this  manner  include: 

•  several  parallel  equidistant  lines 
(Vorontsov-Veliaminov  [12],  Section  1.3); 

•  several  circles  located  on  the  same  cone,  whose  dis- 
tances from  the  cone's  vertex  form  a  geometric  pro-  j 
gression  (Vorontsov-Veliaminov  [12],  Section  III. 9); 

•  equidistant  points  on  a  straight  line  (Vorontsov- 
Veliaminov  [12],  Sections  VII.3  and  IX.3); 

•  "piecewise  circles" :  equidistant  points  on  a  circle;  an  j 
example  is  MCG  0-9-15  (Vorontsov-Veliaminov  [12], 
Section  VII.3); 

•  "piecewise  spirals":    points  on  a  logarithmic  spi- 
ral whose  distances  from  a  center  form  a  geometric  j 
progression;  some  galaxies  of  Sc  type  are  like  that  i 
(Vorontsov-Veliaminov  [12]). 

Not  only  shapes  can  be  this  explained.  This  idea  ' 
also  explains  relative  frequency  of  different  shapes,  the  di-  | 
rections  of  rotation  and  magnetic  field,  possible  evolution 
of  geometric  shapes,  etc.  (see,  e.g.,  [5]). 

4.  ALTERNATIVE  EXPLANATION: 
OPTIMIZATION  UNDER 
UNCERTAINTY  AND 
CORRESPONDING  OPTIMAL  SHAPES 

There  is  an  alternative  way  of  analyzing  the  shapes,  that 
does  not  refer  to  physics  at  all,  but  is  instead  looking  for 
the  best  approximations  of  (unknown)  actual  shapes. 
If  we  use  this  idea,  we  face  the  problem  of  selecting  the  best  | 
family  of  images  for  use  in  extrapolation  under  an  uncer-  | 
tain  optimzality  criterion.  How  can  we  solve  this  problem?  | 
It  turns  out  that  for  every  optimality  criterion  that  satisfies  ! 
the  natural  symmetry  conditions  (crudely  speaking,  that 
the  relative  quality  of  two  image  reconstructions  should 
not  change  if  we  simply  shift  or  rotate  two  images),  the  j 
extrapolation  shapes  that  are  optimal  with  respect  to  this  | 
criterion  can  be  described  as  orbits  of  the  subgroups  of  the 
corresponding  symmetry  group. 

As  a  result,  we  get  exactly  the  shapes  used  in  astronomy  i 
(such  as  spirals,  planes,  spheres,  etc.)  The  details  of  this 
description  are  given  in  [3,4]. 

5.  OPTIMAL  SENSOR  PLACEMENT 
FOR  NON-DESTRUCTIVE  TESTING  OF 
AEROSPACE  SYSTEMS:  THE  SECOND 
PROBLEM 

Testing  is  extremely  important.  Structural  integrity 
is  extremely  important  for  airplanes,  because  in  flight,  the 
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airframe  is  subjected  to  such  stressful  conditions  that  even 
a  relatively  small  crack  can  be  disastrous.  This  problem 
becomes  more  and  more  important  as  the  aircraft  fleet 
ages. 

Sensors  must  be  placed.  At  present,  most  airplanes 
do  not  have  built-in  sensors  for  structural  integrity,  and 
even  those  that  have  such  sensors,  do  not  have  a  sufficient 
number  of  them,  so  additional  sensors  must  be  placed  to 
test  the  structural  integrity  of  a  airframe. 
Each  integrity  violation  (crack  etc.)  starts  with  a  small  dis- 
turbance that  is  only  detectable  in  stressful  in-flight  condi- 
tions. Therefore,  to  detect  these  violations  as  early  as  pos- 
sible, we  should  complement  on-earth  testing  by  in-flight 
measurements. 

Optimal  sensor  placement:  a  problem.  Sensors  at- 
tached outside  the  airframe  interfere  with  the  airplane's 
well-designed  aerodynamics;  therefore,  we  should  use  as 
few  sensors  as  possible.  The  problem  is,  given  the  number 
of  sensors  that  we  can  locate  on  a  certain  surface  of  an  air- 
frame, what  are  the  optimal  placements  of  these  sensors, 
i.e.,  locations  that  allow  us  to  detect  the  locations  of  the 
faults  with  the  best  possible  accuracy. 
For  future  aircraft,  we  have  a  similar  problem  of  sensor 
placement.  The  ideal  design  of  a  future  airplane  should 
include  built-in  sensors  that  are  pre-blended  in  the  perfect 
aerodynamic  shape.  Each  built-in  sensor  is  expensive  to 
blend  in  and  requires  continuous  maintenance  and  data 
processing,  so  again,  we  would  like  to  use  as  few  sensors  as 
possible. 

This  optimality  problem  is  difficult  to  formulate  in 
precise  terms.  Both  for  aging  and  for  the  future  aircraft, 
the  ideal  formulation  of  the  corresponding  optimization 
problem  is  to  minimize  the  average  detection  error  for  fault 
locations.  However: 

•  this  ideal  formulation  requires  that  we  know  the  prob- 
abilities of  different  fault  locations  and  the  probabil- 
ities of  different  aircraft  exploitation  regimes. 

•  In  reality,  especially  for  a  new  aircraft,  we  do  not 
have  that  statistics,  and  for  the  aging  aircraft,  the 
statistics  gathered  from  its  earlier  usage  may  not  be 
applicable  to  its  current  state. 

Therefore,  instead  of  a  well-defined  optimization  problem, 
we  face  a  not  so  well  defined  problem  of  optimization  un- 
der uncertainty.  Since  the  problem  is  not  well  defined,  we 
cannot  simply  use  standard  numerical  optimization  tech- 
niques, we  must  use  intelligent  techniques. 

Geometric  approach.  The  problem  of  choosing  an  op- 
timal sensor  placement  can  be  formulated  in  geometric 


terms:  we  need  to  select  points  (sensor  placements)  on  a 
surface  of  the  given  structure. 

To  solve  this  problem,  we  use  the  experience  of  solving 
similar  symmetry-based  geometric  problems  of  optimiza- 
tion under  uncertainty  in  image  processing  and  image  ex- 
trapolation (see  above).  Since  the  basic  surface  shapes  are 
symmetric,  a  similar  symmetry-based  approach  can  be  ap- 
plied to  the  problem  of  optimal  sensor  placement.  For  the 
simplest  surfaces  such  as  planes,  cones,  etc.,  this  general 
approach  describes  several  geometric  patterns  that  every 
sensor  placements  which  is  optimal  with  respect  to  reason- 
able (symmetric)  optimality  criterion  must  follow. 

The  use  of  neural  networks.  We  then  use  neural  net- 
works: 

•  first,  to  confirm  that  these  placement  patterns  indeed 
lead  to  better  fault  location,  and 

•  second,  to  select  a  pattern  that  leads  to  the  best 
results  for  each  particular  problem. 

Discussion  about  the  results.  The  resulting  place- 
ments are  different  for  different  problems:  For  example, 

•  when  we  test  on-earth,  then  our  main  goal  is  not  to 
miss  the  crack;  as  long  as  we  detected  it,  we  can  al- 
ways perform  additional  measurements  to  determine 
its  location  with  any  desired  accuracy. 

•  In  flight,  however,  detecting  the  crack  is  not  enough; 
in  a  fly-by-wire  aircraft,  we  may  need  to  adjust  the 
control  algorithm  so  as  not  to  stress  the  faulty  sur- 
face. For  that,  we  need  to  know  where  exactly  this 
fault  is  located. 

Space  structures:  a  similar  problem.  A  similar  prob- 
lem of  optimal  placement  of  sensors  for  non-destructive 
testing  can  be  formulated  and  solved  for  space  structures. 
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ABSTRACT 

Nowadays,  we  are  using  mainly  computer  of  fourth  gener- 
ation, and  we  are  designing  fifth-generation  computers.  It 
is  reasonable  to  ask:  what  is  the  perspective?  What  will 
the  computers  of  generation  omega  look  like? 

•  As  the  speed  of  data  processing  increases,  we  face 
a  natural  limitation  of  causality,  according  to  which 
the  speed  of  all  processes  is  limited  by  the  speed  of 
light. 

•  Lately,  a  new  area  of  acausal  (causality  violating) 
processes  has  entered  mainstream  physics. 

This  area  has  important  astrophysical  applications.  In  this 
paper,  we  show: 

•  how  non-equilibrium  thermodynamics  makes  these 
processes  consistent, 

•  how  these  processes  can  be  used  in  computations, 
and 

•  how  the  very  possibility  of  these  processes  lead  to  the 
granularity  of  the  physical  world. 

KEYWORDS:  computer  generations,  granularity, 
non-equilibrium  thermodynamics,  quantum  computing, 
acausal  processes 

1.  GENERATIONS  OF  COMPUTERS: 
WE  NEED  FASTER  AND  FASTER  COM- 
PUTERS 

No  matter  how  fast  modern  computers  are,  there  are  still 
problems  that  take  too  much  computational  time  and, 
thus,  cannot  yet  be  handled  by  modern  computers.  To 
solve  these  problems,  we  must  design  faster  and  faster  com- 
puters. So  far,  the  speed  of  the  computers  has  been  dou- 
bling every  few  years.  Can  we  keep  up  with  this  increase? 


According  to  special  relativity,  all  velocities  are  bounded 
by  the  speed  of  light;  thus,  to  make  computer  elements 
faster,  designers  try  to  decrease  the  size  of  these  elements. 
Every  hardware  technology  eventually  reaches  its  limit, 

1.  e.,  the  smallest  element  size  that  this  technology  can 
achieve;  after  that,  to  decrease  the  size  further,  we  need 
to  invent  a  new  technology.  Computers  that  use  this  new 
technology  are  usually  called  computers  of  a  new  genera- 
tion. 

The  existing  4th  generation  computers  are  based  on  VLSI 
technology.  At  the  current  speed-up  rate,  this  technology 
will  soon  exhaust  its  potential.  Physicists  and  engineers 
are  therefore  working  on  new  technologies  for  fifth,  sixth, 
etc.,  generations  of  computers.  Vague  ideas  are  proposed 
for  technologies  suitable  for  even  further  generations.  (The 
further  generation,  the  more  vague  the  ideas.) 

It  is  therefore  desirable  to  get  a  clear  view  of  the  computers 
of  the  very  distant  future  generations.  We  will  call  these 
computers  generation  omega  after  the  notation  "omega" 
(uj)  for  the  first  infinite  ordinal  number  proposed  by  Can- 
tor, the  founder  of  set  theory  (the  first  consistent  theory 
of  infinite  objects). 

2.  COMPUTER  GENERATIONS  AND 
QUANTUM  PHYSICS:  GENERAL  DE- 
SCRIPTION 

To  get  faster  computers,  we  must  decrease  the  size  of  the 
elementary  processing  elements. 

As  the  size  of  an  object  decreases,  quantum  effects  become 
more  and  more  essential  in  its  description: 
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•  for  macro-size  objects,  quantum  effects  are  rare  (e.g., 
in  lasers),  small,  and  difficult  to  measure; 

•  in  chemistry  (which  studies  molecules)  quantum  ef- 
fects are  often  important; 

•  for  elementary  particles,  quantum  effects  are  so 
overwhelming  that  their  non- quantum  description  is 
practically  impossible. 

Therefore,  as  the  size  of  the  computer  elements  decreases, 
we  need  to  take  quantum  effects  into  consideration  to  a 
larger  and  larger  extent. 

To  take  these  effects  into  consideration,  we  must  use  quan- 
tum physics.  In  general,  a  physical  theory  describes  how 
particles  and  fields  interact  in  space-time.  Therefore,  in 
the  ideal  quantum  physical  theory,  particles,  fields,  and 
space-time  structures  must  be  considered  from  the  quan- 
tum viewpoint.  In  practice,  the  effects  of  their  quanti- 
zation is  different,  so,  some  of  these  quantum  effects  can 
often  be  neglected: 

•  the  largest  quantum  effects  are  related  to  objects 
that  have  been  known  and  analyzed  for  the  longest 
time,  i.e.,  particles; 

•  the  next  quantum  effects  are  related  to  newer  ob- 
jects: fields; 

•  and  finally,  the  smallest  quantum  effects  are  due  to 
quantization  of  space-time  physics,  physics  whose  ex- 
perimental effects  are  still  on  the  edge  of  modern  ob- 
servation abilities. 

The  smaller  the  objects,  the  more  effects  we  need  to  con- 
sider. At  first,  we  have  to  use  traditional  quantum  me- 
chanics (also  called  first  quantization),  in  which  fields  (and 
space-time  structures)  are  described  by  non-quantum  for- 
mulas, but  the  particles'  quantum  behavior  is  taken  into 
consideration.  This  quantum  mechanics  describes  atoms, 
quantum  chemistry,  etc.  Modern  engineering  research  into 
quantum  dots  as  computer  units  and  modern  theoretical 
research  into  quantum  computing,  with  its  exciting  poten- 
tial ability  of  solving  such  hard-to-solve  problems  as  fac- 
toring large  integers  (see,  e.g.,  [2,3,6,7,29,30]),  is  at  this 
quantization  level. 

•  From  the  practical  viewpoint,  quantum  dots  will  have 
a  huge  potential  of  further  miniaturizing  computers, 
so,  if  this  project  is  successful,  we  will  not  need  to 
worry  about  it  for  at  least  a  few  decades. 

•  However,  from  the  fundamental  viewpoint  of  a  more 
distant  future,  we  need  to  look  further. 

To  describe  even  smaller  objects,  we  need  to  use  second 
quantization  (or  quantum  field  theory  (QFT)),  in  which 


both  particles  and  fields  are  quantized,  and  the  fully  quan- 
tized theory,  in  which  space-time  is  quantized  as  well. 

3.  ENTER  ACAUSAL  PROCESSES 

In  general  (curved)  space-time,  the  maximum  possible 
communication  speed  (i.e.,  the  speed  of  light  c)  is  deter- 
mined by  the  metric  tensor  field  gij;  see,  e.g.,  [15]. 

•  In  non-quantum  theories,  this  field  is  smoothly  de- 
pending on  coordinates  and  therefore,  the  corre- 
sponding maximal  speed  is  slightly  changing  in  space 
and  time  (and  is  practically  constant  for  small  areas). 

•  Quantization  of  space-time  means,  in  particular,  that 
the  metric  tensor  field  undergoes  quantum  fluctu- 
ations, and,  as  a  result,  the  actual  maximal  speed 
at  any  given  point  is  randomly  larger  or  randomly 
smaller  than  c.  Since  the  average  deviation  must  be 
0,  this  means,  roughly  speaking,  that  in  half  of  the 
cases,  the  maximal  possible  speed  is  larger  than  c, 
and  in  half  of  the  cases,  it  is  <  c. 

An  object  of  finite  size  is  influenced  by  the  "average"  field 
in  the  area  that  this  object  occupies. 

•  If  the  object  is  large  enough,  then  the  random  fluc- 
tuations "average  out",  and  the  object  moves  as  if  in 
a  space  where  the  maximal  speed  is  the  macro-world 
speed  of  light. 

•  However,  if  we  consider  much  smaller  objects,  then 
these  objects  can  actually  feel  the  local  fluctuations. 

Therefore,  if  this  tiny  object  moves  (and  transfers  informa- 
tion) at  a  maximal  local  speed,  and  this  local  speed,  due 
to  a  fluctuation,  is  larger  than  c,  then  we  get  a  microobject 
that,  without  violating  causality,  is  able  to  transfer  infor- 
mation at  a  speed  v  that  is  larger  than  the  macro-level 
speed  of  light  c. 

The  smaller  the  object,  the  larger  this  potential  speed  v 
(it  can  be,  actually,  as  large  as  possible). 
As  a  result,  we  have  an  unexpected  additional  boost  in  com- 
puter performance: 

•  we  considered  smaller  and  smaller  processing  ele- 
ments because  the  smaller  these  elements,  the  faster 
the  computer; 

•  it  turns  out  that  if  these  elements  are  small  enough 
to  take  into  consideration  full  quantum  theory,  then 
not  only  their  size  gets  smaller,  but  also  the  actual 
speed  of  communication  transfer  can  be  made  faster 
than  the  macro-level  speed  of  light;  thus,  computers 
become  even  faster. 
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4.  ACAUSAL  PROCESSES  IN  PHYSICS 
AND  BIOLOGY:  A  BRIEF  HISTORY 

Traditional  physics  is  causal  in  the  sense  that  future  events 
are  determined  by  the  past  state  of  the  Universe.  This 
dependence  can  be  deterministic  (as  in  classical,  pre- 
quantum  physics),  or  stochastic  (as  in  quantum  physics). 
There  have  been  for  some  time  an  idea  of  the  possibility 
of  acausal  processes,  in  which  the  influence  can  go  in  the 
opposite  direction:  future  can  influence  the  past.  Such 
processes  are  called  acausal. 

The  idea  that  the  speed  of  all  particles  cannot  exceed  the 
speed  of  light  (and  that,  therefore,  it  is  impossible  to  in- 
fluence the  past)  was  one  of  the  main  ideas  of  Einstein's 
Special  Relativity  Theory. 

In  quantum  mechanics,  due  to  its  probabilistic  character, 
many  deterministic  restrictions  of  pre-quantum  physics 
become  somewhat  "blurred"  in  the  sense  that  they  are 
no  longer  prohibiting  some  events  completely,  but  simply 
telling  that  these  formerly  prohibited  events  have  small 
probability.  For  example,  in  classical  physics,  a  particle 
cannot  penetrate  the  potential  barrier  if  the  energy  of  this 
barrier  exceeds  the  initial  energy  of  the  particle;  in  quan- 
tum physics,  however,  it  is  quite  possible  (although  not 
highly  probable)  that  a  particle  "tunnels"  through  this 
barrier  and  end  up  on  the  other  side  of  it.  This  is  not 
simply  a  theoretical  conclusion,  this  "tunnel  effect"  is  the 
basis  of  "tunnel  diodes"  that  are  extensive  used  in  nowa- 
days electronics. 

Uncovered  possibility  that  quantum  mechanics  can  make 
pre-quantum  restrictions  "soft"  lead  to  a  possibility  that 
causality  may  also  be  violated  in  quantum  processes.  Such 
violations  were  first  discovered  by  Einstein,  Podolsky,  and 
Rosen  in  their  famous  paradox  (physicists  call  it  EPR  para- 
dox for  the  first  letters  of  the  authors'  names;  for  details, 
see  [32]).  Einstein,  who  was  not  a  great  fan  of  quantum 
mechanics,  proposed  this  paradox  as  a  way  of  disproving 
this  theory.  (It  is  worth  noticing  at  this  point  that  all 
experiments  so  far  seem  to  confirm  quantum  mechanics.) 
EPR  paradox  does  not  lead  to  a  real  time  travel:  it  simply 
shows  that  in  the  resulting  quantum  formalism,  the  fu- 
ture state  influences  the  past  one;  however,  all  attempts  to 
extract  a  real  time  travel  from  it  turned  out  to  be  futile  be- 
cause, crudely  speaking,  the  resulting  influence  on  the  past 
is  so  small  that,  when  we  try  to  measure  it,  it  "drowns"  in 
the  inevitable  quantum  uncertainty  of  measurements. 
This  fact  does  not  mean  that  causality  is  true  in  quan- 
tum physics.  In  the  last  decade,  several  more  sophisticated 
schemes  have  been  proposed  that,  in  principle,  can  lead  to 
the  actual  time  travel  [33-35]. 

In  addition  to  physical  arguments  in  favor  of  possible 
causality  violations,  there  exist  biological  motivations  for 


such  processes:  Rosen  [25]  suggests  that  the  living  beings 
can  use  physical  processes  that  influence  the  current  events 
depending  on  the  future  ones  (he  calls  such  acausal  pro- 
cesses anticipatory;  see  also  [24-28]. 

Until  1988,  acausal  processes  has  been  mainly  considered 
as  one  the  many  possibilities,  not  the  most  probable  possi- 
bility, and  not  part  of  the  mainstream  physics.  In  1988,  the 
physicists'  attitude  to  acausal  processes  changed  when  Kip 
S.  Thorne,  the  world's  leading  astrophysicist,  published 
several  papers  in  the  leading  physical  journal  Physical  Re- 
views in  which  he  showed  that  within  the  existing  quan- 
tum physics  and  cosmology,  acausal  processes  are  highly 
probable;  these  publications  lead  to  several  other  serious 
research  results  [1,9,17,18,20-22,31].  As  a  result  of  this  re- 
search, three  basic  types  of  acausal  processes  have  been 
discovered;  these  processes  are  summarizes  in  Thome's 
monograph  [32]  (for  more  popular  expositions,  see,  e.g., 
[4,5,12,13,19,23,36]. 

5.  PARADOXES  OF  ACAUSALITY  AND 
HOW  NON-EQUILIBRIUM  THERMO- 
DYNAMICS CAN  SOLVE  THEM 

The  idea  of  acausal  processes  was,  for  a  long  time,  mainly 
part  of  science  fiction,  because  this  idea  is  paradoxical. 
The  most  well  known  father  paradox  is  most  convincingly 
described  in  terms  of  the  actual  time  travel  (travel  to  the 
past): 

The  time  traveler  paradox  occurs  when  a  time  traveler  goes 
to  the  past  and  shoots  his  own  father  to  death  before  he 
himself  was  conceived.  Then: 

•  On  one  hand,  the  time  traveler  is  still  alive,  because 
he  was  alive  before  the  killing,  and  he  did  not  harm 
himself  in  any  way. 

•  On  the  other  hand,  since  his  father  has  died,  he  could 
not  have  conceived  the  time  traveler,  and  hence,  the 
time  traveler  cannot  be  born.  So,  he  is  at  the  same 
time  not  alive. 

A  similar  paradox  occurs  if  we  cannot  actually  travel  to 
the  past,  only  influence  it;  also,  it  occurs  even  if  we  have 
no  human  beings  at  all,  simply  physical  processes.  The 
reason  why  we  (and  other  authors)  present  this  paradox  in 
its  time-traveler  form  rather  that  in  the  form  of  differential 
equations  of  physics  is  that  when  we  have  a  problem  with 
differential  equations,  there  can  be  many  reasons  for  that 
(wrong  equations,  wrong  method  of  solution,  etc.),  while 
the  time-traveler  paradox  reveals  the  paradoxical  character 
of  acausal  processes  themselves. 

For  clarity  of  exposition,  we  will  describe  the  current  so- 
lution of  the  paradoxes  of  acausality  on  the  same  time- 
traveler  example  as  we  described  the  paradox  itself.  Of 
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course,  similar  to  the  fact  that  the  paradox  occurs  even 
when  there  is  no  time  traveler  at  all,  the  described  solu- 
tion is  also  applicable  to  the  case  of  purely  physical  acausal 
processes.  This  solution  is  described  in  [8,10,11,14,16]. 
Since  the  time  traveler  is  alive  at  the  time  when  he  starts 
the  shooting,  this  means  that  he  was  conceived  after  all, 
and  therefore,  that  his  attempt  to  kill  his  father  has  failed. 
Why  could  it  have  failed?  Well,  the  gun  may  have  malfunc- 
tioned, or  he  might  have  missed,  or  a  brick  (or  a  meteorite) 
might  have  fallen  on  the  time  traveler's  head  at  the  very 
moment  when  he  was  ready  to  shoot,  or  a  policeman  of 
the  past  has  stopped  him,  etc. 

Some  of  these  possibilities  are  quite  realistic,  some  (like  a 
meteorite)  have  an  extremely  low  probability.  The  time 
traveler  can  prepare  for  some  of  these  possibilities:  he 
can  check  his  gun  before  going  to  the  past,  use  automatic 
weapons,  wear  a  hard  hat  against  falling  bricks,  a  fake  po- 
lice uniform  to  prevent  an  interference  of  the  past's  police, 
etc.  In  principle,  whatever  possibility  we  describe,  the  time 
traveler  can  take  care  of  it.  However,  he  cannot  take  care 
of  them  all:  for  example,  in  principle,  the  gun  can  mal- 
function simply  due  to  some  unexpected  (but  probable) 
random  Brownian  motion  of  its  molecules. 
If  the  time  traveler  takes  care  of  all  possibilities  with  rea- 
sonable (sufficiently  high)  probability,  this  still  leaves  other 
possibilities,  with  extremely  low  probability,  that  normally 
do  not  occur,  but  that  would  have  to  occur  because  other- 
wise, we  would  have  a  paradox. 

Summarizing:  if  an  acausal  process  is  possible,  then  some 
events  will  take  place,  whose  probability  is  normally  ex- 
tremely low  to  prevent  this  acausal  influence  from  happen- 
ing. This  conclusion  is  true  not  only  for  a  time  traveler, 
but  for  an  arbitrary  acausal  process. 

6.  COMPUTERS  THAT  USE  ACAUSAL 
PROCESSES 

According  to  the  above  analysis,  the  very  possibility  of 
an  acausal  process  leads  to  the  implementation  of  highly 
improbable  events.  Let  us  assume  that  we  have  organized 
such  an  acausal  process  in  such  a  way  that  if  we  switch  it 
on,  it  will  lead  to  an  implementation  of  a  highly  unprobable 
event  with  a  probability  po  <C  1  ■ 

Let  us  show  how  this  device  can  be  used  to  solve  a  typ- 
ical hard-to-compute  problem  of  propositional  satisfiabil- 
ity: given  a  Boolean  (propositional)  formula  F{x\, . . .,  xn) 
with  n  Boolean  variables  x\, . . .  ,xn,  find  the  values  (if  any) 
for  which  the  resulting  formula  is  true.  This  problem  can 
be  easily  solved  by  trying  all  2"  possible  combinations  of 
n  "true"  and  "false"  values.  Unfortunately,  this  exhaus- 
tive search  becomes  non-feasible  even  for  n  «  300,  when 
the  resulting  computation  time  exceed  the  lifetime  of  the 
Universe.  It  is  known  that  this  problem  is  computation- 
ally hard  (the  precise  term  is  NP-hard)  in  the  sense  that 


if  we  can  solve  it  in  reasonable  time  (i.e.,  time  bounded  j 
by  a  polynomial  of  n),  then  we  would  be  able  to  solve  all 
problems  from  a  large  class  (called  NP)  in  reasonable  time, 
and  this  most  computer  scientists  consider  impossible. 
To  find  the  values  X{  using  acausal  processes,  we  can  set 
up  n  quantum  random  number  generators  that  gener- 
ate n  random  bits.   Then,  we  check  whether  the  results 
x\, . . . ,  xn  of  these  bits  satisfy  a  given  formula.   If  they 
do,  these  values  are  the  desired  answer;  if  they  do  not,  we  I 
switch  the  above-mentioned  acausal  process  on.   Nature  i 
has  two  choices: 

•  It  can  generate  the  desired  solution  in  the  random 
generators.  If  the  formula  has  only  one  satisfying 
combination  of  variables  (out  of  2"),  the  probability 
of  this  event  is  2-n. 

•  It  can  also  generate  a  vector  that  does  not  satisfy  the 
given  formula.  In  this  case,  the  switched-on  acausal  i 
process  makes  the  nature  implement  the  highly  un- 
probable event,  with  probability  po- 

i 

Therefore  if  po  <C  2~n,  it  is  much  more  probable  that  j 
Nature  will  prefer  the  first  alternative. 
This  idea  was  announced  in  [11,14]  and  described  in  detail 
in  [10], 

7.  ACAUSAL  PROCESSES  LEAD  TO 
GRANULARITY  OF  THE  PHYSICAL 
WORLD 

In  the  previous  section,  we  applied  the  idea  of  acausal  pro-  j 
cesses  to  computations,  artificially  designed  computations.  j 
However,  we  can  also  apply  it  to  nature  itself. 

i 

•  In  traditional  causal  physics,  whatever  initial  condi- 
tions x(to)  we  set  at  the  initial  moment  of  time  to,  we 
can  always  integrate  the  equations  and  end  up  with 
the  state  of  the  Universe  x(t)  =  F(t,x(to))  for  all  i 
consequent  moments  of  time  t  >  to  (here,  the  func-  > 
tion  F  describes  the  dynamics  of  the  system).  In  this 
case,  initial  conditions  are  arbitrary  and  therefore,  we 
have  a  continuous  set  of  possible  states. 

•  If  there  is  an  acausal  process  present,  then  the  initial  ^ 
condition  cannot  be  arbitrary:  if,  e.g.,  we  have  an 
acausal  process  that  transforms  a  part  p(x(t))  of  a  , 
state  at  moment  t  into  a  moment  of  time  t'  <  t,  then,  j 
in  addition  to  the  dynamical  equation  that  connects 
x(t)  and  x(t')  with  x(to)  must  have  an  additional  t 
condition  p(x(t))  =  p(x(t')).  Therefore,  the  initial 
condition  x(to)  must  satisfy  the  additional  equation 
p(F(t,x(to))  =  P(F(t',x(t0)). 

How  does  an  additional  equation  restricts  the  set  of  all 
possible  conditions?  For  the  case  of  one  variable,  a  linear 
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equation  has  a  single  solution,  a  quadratic  equation  has, 
in  general,  two  solutions,  etc.  The  more  complicated  the 
equations,  the  more  granular  is  the  set  of  its  solutions. 
Since  the  dynamic  equations  are,  usually,  very  compli- 
cated, we  naturally  expect  that  the  additional  equations 
caused  by  acausal  processed  lead  to  a  high  granularity  of 
Universe. 

8.  ASTROPHYSICAL  APPLICATIONS 
OF  ACAUSALITY 

A  general  idea  of  these  applications.  As  we  have  men- 
tioned, acausal  processes  lead  to  highly  unprobable  events. 
According  to  statistical  physics,  if  Nature  has  a  choice,  it 
would  rather  prefer  situations  where  these  highly  unprob- 
able events  do  not  occur.  Therefore,  if  there  is  a  random 
(statistical  physics-type)  process  that  can  either  lead  to 
an  acausal  process  or  not,  then,  the  actual  probability  of 
this  process  resulting  in  acausality  is  very  low,  much  lower 
that  it  would  have  been  if  we  did  not  take  the  possibility 
of  acausal  processes  into  consideration  (practically,  ther- 
modynamically  impossible). 

In  this  paper,  we  only  show  this  idea  on  one  possible  ap- 
plications, whose  description  enables  us  to  avoid  techni- 
cal details;  other  applications  are  also  possible  (see,  e.g., 
[8,11]). 

Example:  The  isotropization  of  the  Universe.  One 

of  the  main  problems  of  modern  cosmology  (see,  e.g.,  [32]) 
is  that  the  Universe  is  too  isotropic.  On  large  scale,  in 
all  directions  in  which  we  look,  we  see  the  same  statistical 
distribution  of  matter.  The  initial  state  of  the  Universe 
was,  according  to  the  modern  physical  viewpoint,  random, 
and  therefore,  far  from  being  isotropic.  Hence,  the  ob- 
servable isotropization  is  due  to  some  physical  processes. 
Many  physical  processes  shuffle  matter  around  and  thus, 
contribute  to  the  isotropization,  but  calculations  show  that 
during  the  lifetime  of  our  Universe,  these  processes  are  not 
sufficient  to  explain  the  current  isotropy;  to  be  more  pre- 
cise, for  random  initial  conditions,  the  probability  of  the 
initial  conditions  that  lead  to  the  observed  isotropy  is  very 
low. 

The  explanation  of  this  phenomenon  in  acausal  physics  is 
as  follows:  Anisotropy  means  that  different  distant  areas 
of  the  Universe  will  have  radically  different  matter  densi- 
ties. For  acausal  processes,  there  is  no  speed  restriction; 
therefore,  since  there  is  an  excess  of  matter  in  one  area  and 
abundance  in  another  area,  acausal  processes  will  re-shuffle 
the  matter  from  the  dense  area  to  the  area  where  matter 
is  scarce.  Such  a  process  is,  as  we  have  mentioned,  ther- 
modynamically  unprobable  and  therefore,  it  is  much  more 
possible  that  the  random  initial  conditions  are  chosen  in 
such  a  way  that  prevents  these  acausal  re-shufflings,  i.e., 
that  the  initial  conditions  lead  to  the  observable  isotropic 


Universe. 

What  this  explanation  does  is  shows  that  the  probability 
of  an  initial  state  of  the  Universe  leading  to  isotropiza- 
tion, the  probability  that  is  small  if  we  do  not  take  acausal 
processes  into  consideration,  becomes  much  larger  if  we 
consider  the  possibility  of  acausal  processes. 
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ABSTRACT 

Determination  of  terrain-based  decision  drivers 
operative  in  enabling  the  tactical  commander  'to  see' 
the  battlefield  represents  a  critical  objective  for 
soldier  training  as  well  as  implementation  of 
autonomous  land  traverse  systems.  At  process  level, 
the  effort  unveils  obtained  waypoints  in  the 
development  dynamic  for  knowledge  evolution. 
Based  on  manipulation  of  distance  factors  among 
maneuver  (combatant)  elements  and  terrain  features 
in  a  mission  execution  context  within  a  simulations 
arena,  an  aggressive  methodology  is  uncovered  for 
extracting  participatory  knowledge  elements,  and  a 
differentiation  scheme  is  disclosed  for  realizing 
(reporting)  pattern-element  significance  applicable  in 
tactical  decision  development.  A  resource  is 
suggested  for  detailed  analysis  of  'deep'  decision 
factors  and  their  scheme  of  application  in  land 
traverse  and  tactical  maneuver  situations. 

KEYWORDS:  tactical  decision  logic,  virtual 
simulation,  constructive  simulation,  JANUS, 
Distributed  Interactive  Simulation  (DIS) 

1.  INTRODUCTION 

To  'see'  the  battlefield  represents  a  critical 
capability  for  the  tactical  commander  as  well  as, 
in  today's  technologically  advanced  culture,  for 
autonomous  systems  intended  to  maintain  pace 
with  or  substitute  for  manned  land  traverse 
systems.  To  'see'  the  battlefield  is  to  grasp  not 
just  the  presence  and  location  of  unfolding 
elements  in  an  immediate  field  of  view  but,  for 
the  tactical  commander,  to  appreciate  the 
relevance  of  near  and  remote  terrain  features  for 
tactical  advantage  and  identify  zones  of 
potential  action  and  denial  areas  for  enemy 
activity.  It  is  to  mentally  prepare  for  emerging 
events  by  identifying  the  utility  of  successive 
terrain  features  to  achieve  reduction  of  risk, 
extended  optimization  of  element  positioning, 
and  expeditious  accomplishment  of  the  mission. 


The  projected  research  explores  the  elements  of 
awareness  implicit  in  'seeing'  the  battlefield 
under  alternative  tactical  conditions. 

In  approaching  decision  logic  elements  for 
'seeing'  the  battlefield,  the  projected  effort  turns 
to  simulation  capabilities  for  circumstance 
creation  and  observation  of  battle  commander 
behaviors  imposed  during  review  of  displayed 
evolving  events.  In  this  combination,  the 
research  discloses  a  methodology  for  using 
simulation  to  identify  'deep  significance'  among 
terrain  details  for  tactical  maneuver  and  soldier 
preparation,  and  specifically,  focuses  interest  on 
the  impact  of  providing  virtual-simulation 
terrain  detail  to  tactical  commanders  and  staff 
during  JANUS  constructive-simulation  exercises 
(see  system  description  below),  while  probing 
the  relevance  of  entity  (terrain  feature  or  tactical 
element)  separations  for  situational-awareness 
during  those  exercises.  The  interaction  of 
JANUS  constructive-simulation  technology  with 
DIS  (Distributed  Interactive  Simulation)  virtual 
simulation  lies  at  the  heart  of  the  investigative 
approach. 

2.  SYSTEM  DESCRIPTIONS 

The  JANUS  model  is  a  man-in-the-loop,  force- 
on-force,  stochastic  model  which  provides 
schematic  representation  of  the  battlefield  to 
unit  staff  members  who  control  the  action  from 
individual  display  consoles.  Terrain 
representation  is  limited  to  contour  indicators 
overlaying  a  latitude-longitude  grid  and  shaded 
with  colored  areas  indicative  of  vegetation, 
water,  and  urban  development.  Roads,  rivers, 
obstacles,  and  man-made  features  are  portrayed, 
as  are  tactical  vehicle  icons.  Terrain  elevation 
profiles  along  single  selected  axes  can  be 
displayed  successively  in  a  superimposed 
window. 
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Virtual  imagery  at  the  Battlefield  Distributed 
Simulation  -  Developmental  (BDS-D)  facility 
offers  an  out-the-window  representation  of 
tactical  and  environmental  features  in  a  man-in- 
the-loop,  force-on-force  context.  Battlefield 
entities  are  portrayed  in  a  contextually  accurate 
fashion  relative  to  size,  range,  orientation, 
intervisibility,  and  relative  spatial  separation. 
Displayed  fields  of  view  and  entity  state 
modifications  are  operator  controlled  through 
representational  mechanisms  reflecting  actual 
user  equipment. 

Despite  the  JANUS  capability  of  portraying 
profile  arrays  along  selected  terrain  azimuths  in 
a  superimposed  display  window,  staff  training 
with  such  resources  provides  no  direct  view  of 
the  relevant  terrain  from  selected  viewing  points 
(vehicles).  This  denies  appropriate  staff  officers 
terrain-use  details  potentially  beneficial  to  the 
operation,  and  limits  investigation  of  tactical 
decision  logic  to  broad  force-maneuver  concerns 
exclusive  of  terrain-characteristics-driven 
tactical  movement  issues.  Through  linkage  of 
constructive  schematic  displays  and  virtual 
tactical  arrays  in  a  mission  execution  context, 
analysis  of  the  'deep  significance'  of 
confrontational  and  cooperative  terrain  details 
for  tactical  maneuver  and  soldier  preparation  is 
operationalized. 

3.  EXPERIMENTAL  CONCEPT 

Through  linked  virtual  and  constructive 
simulation  technologies  including  manned- 
simulators,  semi-automated  force  generators 
(ModSAF  systems),  and  JANUS  workstations  to 
represent  an  indirect-fire-supported,  staff- 
supported  mounted  maneuver  element 
advancing  toward  a  designated  objective, 
exploration  of  knowledge  elements  operative  in 
'seeing'  the  battlefield  will  be  undertaken  at  the 
Mounted  Warfare  Test  Bed  (MWTB),  Ft  Knox, 
Ky.  Staff  officers  at  JANUS  workstations 
representing  'global'  tactical  events  and 
augmented  with  virtual  terrain  imagery 
synchronized  for  current  battlefield  conditions 
and  provided  through  imagery  transmissions 
from  manned-simulator-based  maneuver 
elements  will  direct  maneuver  and  fire  support 
activities  for  an  advancing  armor  force,  while 
vehicle  commanders  in  manned  tank-simulators 
with   access   to    schematic   JANUS  arrays 


participate  as  maneuver  elements.  In  each 
exercise,  the  maneuver  force  supported  by 
ModSAF-generated  additional  support  units  (an 
artillery  firing  battery)  for  tactical  realism  will 
advance  over  European  terrain  as  JANUS-based 
staff  officers  (a  battalion  commander,  an 
operations  officer  [S3],  and  a  fire  support  officer 
[FSO] )  direct  tactical  support  for  the  mission. 

7.  Tactical  Context. 

With  poor  threat  intelligence,  BLUFOR  (i.e.,  the 
friendly  maneuver  force)  will  advance  across  a 
well-vegetated  hilly  European  environment, 
maintaining  maximum  use  of  concealment  and 
relying  primarily  on  indirect  fire  (artillery)  to 
respond  at  threat  encounters.  The  remotely 
located  BLUFOR  battalion  staff  will  provide 
axis-of-advance  and  fire-support  information  to 
the  BLUFOR  maneuver  commander, 
designating  determined  terrain  advantages  for 
route  selection  and  indirect-fire  attrition  zones 
from  provided  displays.  BLUFOR  vehicle 
commanders  will  fashion  and  execute  micro- 
maneuvers  based  on  JANUS  and  virtual 
representations,  disclosing  across  scenarios, 
terrain-interpretation  strategies  for  local 
maneuver  demands.  BLUFOR  will  encounter 
OPFOR  [i.e.,  opposing  force  or  Threat]  at 
selected,  test-directorate-controlled 
BLUFOR/OPFOR  distances  from  appointed 
terrain  features,  to  assess  'distance  participation' 
as  a  contextual  factor  in  decision  logic.  Interest 
in  why  an  action  is  selected  based  on  aggregated 
'distance'  factors,  as  opposed  to  how  an  action  is 
performed,  underlies  the  inquiry. 

2.  Execution. 

Each  scenario  will  be  performed  twice,  once 
with  virtual  and  constructive  screen- 
augmentation  at  respective  locations  and  once 
without.  Training  effectiveness  and  decision 
logic  will  be  explored  under  each  condition  in 
the  series  of  scenarios  portraying  alternative 
inter-feature  and  inter-combatant  distances  as 
primary  decision-drivers.  Achievement  of  this 
disclosure  will  be  done  by  comparing  BLUFOR 
leader-directives  developed  with  JANUS 
schematics  (regarded  as  aggregates  of  distances, 
locations,  and  features),  and  those  provided  with 
JANUS  schematics  synchronized  with  virtual- 
array  'intervention  elements'  descriptive,  in  an 
'aspected'  array,  of  spatially  serial  features  to  be 
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dealt  with  (disruptive  [obstacles]  and  desirable 
[concealment]  )  on  the  represented  terrain  array 
(battlefield).  'Seeing  the  battlefield',  currently 
addressed  through  relevant  feature/combatant 
separations,  will  be  examined  through 
directorate-imposed  distances  among  display 
elements.  (OPFOR  elements  will  emerge  from 
selected  hidden  start-points  under  the  direction 
of  test  directorate  personnel  to  create  BLUFOR 
decision  circumstances  consistent  with  study 
objectives.  BLUFOR  display  options  will  reflect 
associations  of  BLUFOR  remoteness-to-terrain- 
features  relative  to  OPFOR-remoteness-to- 
additional-[or  the  same] -features,  and  will  serve 
as  the  basis  for  knowledge-element 
characterization  effective  in  bringing  about 
commander  decisions.)  Performance  will  be 
examined  for  the  'intervention  logic'  applied  at 
the  S3,  the  battalion  commander,  the  FSO,  and 
the  maneuver  commander  duty  positions. 
Training  enhancements  will  be  assessed  through 
performance  comparisons  across  early  and  late 
trials  with  both  JANUS  and  JANUS-augmented 
display  data.  With  OPFOR  elements  reflecting 
ground  and  air  units  moving  along  single, 
multiple,  and  air  corridors  in  separate  events, 
with  various  delays,  distances,  and  locations 
relative  to  BLUFOR,  at  BLUFOR  threat 
observation,  data  loggers  will  capture  objective 
data  for  a  variety  of  tactical  circumstances. 

3.  Player  Functionality- 
Battalion  Staff.  BLUFOR  battalion  staff 
personnel  will  direct  maneuver  and  fire 
direction  in  geographically  similar  but  tactically 
diverse  operational  situations.  Battalion 
commanders  will  formulate  orders  positioning 
friendly  forces  against  observed  threat  elements 
based  in  part  on  'intervention'  configurations; 
S3s  will  fabricate  traverse  and  kill-zone 
guidelines,  also  from  'intervention' 
configurations  ('traverse  logic');  and  FSOs  will 
fashion  fire  support  and  kill-zone  alternatives 
again  in  part  from  'intervention'  logic.  From 
displays  descriptive  of  designated  distances 
among  terrain  features  and  combatant  forces, 
staff  personnel  will  exercise  the  option 
(decision)  to  apply  or  bypass  terrain  features  for 
tactical  advantage,  and  investigators  will  be 
provided  a  metric  (distance  at  which 
circumstance-import  changes  from  beneficial  to 
detrimental)  by  which  interpretation  strategies 
can  be  specified.    Early-late  comparisons  of 


identical  tactical  configurations  will  address 
training  benefits  associated  with  display 
augmentation. 

Vehicle  Commander  Operations.  Maneuver 
commanders  serving  as  a  platoon  leader  and  his 
'wingman'  will  advance  using  maximum 
concealment  (to  enhance  survivability)  on  a 
designated  objective  in  each  of  the  tactical 
scenarios.  At  threat  observation,  BLUFOR 
initially  will  request  indirect  fire  to  suppress  the 
OPFOR  and  will  maneuver  to  achieve  terrain 
advantage  for  a  possible  subsequent  direct  fire 
engagement.  Commanders  again  will  use 
'intervention'  details  provided  in  virtual  and 
JANUS  displays  to  select  and  execute  micro- 
maneuvers,  and  will  disclose,  through  display 
options,  the  relevance  of  pertinent  distance 
factors  for  movement  decisions.  This  will 
provide  investigators  additional  metric  insights 
for  interpretation  strategies  at  the  site  level  of 
maneuver  operations. 

4.  PROJECTION 

Assessment  of  terrain-detail  significance  for 
tactical  maneuver  and  soldier  training  through 
simulation  technologies  and  controlled 
stimulation  events  unveils  an  exciting  and 
robust  procedure  for  'deep  significance' 
determination  at  multiple  levels  of  concept 
formation  and  in  diverse  operational 
circumstances.  Though  currently  structured  for 
distance  relevance  extraction  in  platoon 
maneuvers,  the  present  application,  and  each 
succeeding  application,  essentially  serves  as  a 
solitary  designation  in  the  larger  requirements- 
determination  quandary  for  autonomous 
behavior,  and  conveys  receptivity  to  easy 
reconfiguration,  by  echelon,  combat  activity, 
battlefield  resources,  support  activity,  threat 
activity,  or  other  decision  drivers,  to  assess 
distance  relevance  impacted  by  additional 
factors  in  platoon/unit  maneuvers.  The 
significance  of  abundant  sub  task-level  decisions 
which  necessarily  underpin  mission 
performance,  and  the  importance  and  complicity 
of  seat-specific,  soldier  decision  logic  and  input, 
which  again  underlie  duty  performance,  also  are 
highlighted  in  the  approach. 

Given  the  methodology  as  a  convincing  solution 
for  extracting  and  characterizing  significance/ 


393 


meaning  from  observed  circumstances,  the 
approach,  through  combined  virtual  and 
constructive  simulation  application,  ultimately 
provides  the  critical  element  of  authenticity  to 
the  development  of  significance-measures, 
essentially  through  analysis  in  a  mission 
execution  context.  In  capturing  such  measures 
in  this  manner,  a  broad  operational  context  is 
allowed  to  function  as  backdrop  and  source  of 
serial  and  spatial  depth  for  more  effective 
assessment  of  proximal  and  distal  significance 
operators. 
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Benefit  Documentation 
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Rob  Frank,  EPPJ  I&C  Center 
Joe  Weiss,  EPRI 


ABSTRACT 


Many  utilities  have  been  contemplating  a  control 
system  retrofit  for  their  fossil  power  plants. 
However,  in  many  cases,  the  benefit  justification 
has  been  lacking.  Consequently,  EPRI  and  TVA 
formed  a  tailored  collaboration  project  to  quantify 
the  benefits  of  implementing  a  distributed  control 
system  (DCS)  retrofit  in  a  1955-vintage  coal-fired 
power  plant.  TVA's  Kingston  Fossil  Plant  Unit  9 
was  the  selected  unit  for  the  project.  A 
comprehensive  benchmarking  program  was 
established  before  the  changeout  and  is  continuing. 
This  paper  provides  a  complete  detailing  of  the 
retrofit  and  the  documentation  of  benefits  to  date. 
Benefits  documented  to  date  include  reduced 
emissions  (NOx  and  C02),  heat  rate 
improvement,  reduced  unburned  carbon,  improved 
dispatch  capability,  maintenance  cost  reduction, 
and  other  factors.  To  date,  heat  rate,  unburned 
carbon,  dispatch  improvements  and  NOx 
reduction  appear  to  be  significant. 


BACKGROUND 


Fossil  plant  operation  is  changing.  This  includes 
the  move  from  base  load  operation  to  cycling 
operation  and  meeting  the  emission  requirements 
of  the  Clean  Air  Act  Amendments.  Many  utilities 
are  implementing  Distributed  Control  System 
(DCS)  retrofits  to  meet  these  new  challenges. 
Many  other  utilities  have  been  seeking 
cost/benefit  data  to  justify  a  control  retrofit.  To 
date,  most  of  the  economic  justification  for 
control  system  retrofits  has  been  heat  rate 
improvement.  However,  a  DCS  should  be  able  to 
provide  other  benefits  including  improved  plant 
reliability/availability,         improved  dispatch 


response,  reduced  emissions,  salability  of  wastes, 
and  more.     Additionally,  since  control  retrofits 
usually  occur   during  an   outage    when  other 
improvements/modifications       also     are  being 
performed  (e.g.,  turbine  modifications),  it  has  been 
difficult  to  estimate  the  direct  benefits  from  the 
control  retrofit.    The  purpose  of  this  project  was 
to  provide   a  comprehensive    benchmarking  of 
control   retrofit   benefits  including  all  potential 
plant  improvements.     Kingston  Unit  9  is  a  200 
MW,  CE-tangential-fired,   twin-furnace  boiler;  a 
typical    1950's    vintage    coal-fired    plant  with 
electric  analog  controls  which  is  currently  being 
dispatched   by   Automatic    Generation  Control 
(AGC). 


APPROACH 


EPRI    and    TVA    studied    the    feasibility  of 
retrofitting  all  nine  of  the  Kingston  plant  control 
systems  with  DCSs.  The  feasibility  study  included 
evaluating  the  existing  instrumentation,  actuators, 
and  plant  support  systems  as  well  as  the  control 
system.    EPRI  Controls  Retrofit  Guidelines  were 
used  in  the  feasibility  study.    Subsequent  to  the 
feasibility  study,  it  was  decided  to  retrofit  only  one 
of  the  units  (Unit  9)  and  compare  the  benefits  of 
the  retrofit    to  the  other  eight  non-retrofitted 
units. 


SCOPE  OF  WORK 


As  a  result  of  a  study,  several  TVA  Task  Teams 
identified  the  need  for  a  state-of-the-art  control 
system  upgrade  at  the  Kingston  Fossil  Plant. 
Through  a  Tailored  Collaboration  (TC) 
agreement,    TVA  and  EPRI   planned   for  the 
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installation  of  a  full  Distributed  Control  System 
(DCS)  for  Kingston  Unit  9,  the  I&C  Center  to  be 
established  at  the  Kingston  Plant,  the  installation 
of  a  high  fidelity  simulator  within  the  center  for 
training  and  advanced  control  system  algorithm 
development,  and  the  documentation  of  benefits 
as  a  result  of  the  retrofit.  Another  project 
deliverable  was  the  detailed  documentation  of  the 
project  benefits. 


PROJECT  DESCRIPTION 


The  project  was  done  in  three  phases.   The  Phase 
I  study  was  performed  by  Gilbert  Commonwealth 
under  contract    with  TVA's  Fossil  Engineering 
Services     (FES)     group.         Overall  project 
management  for  Phase  I  was  provided  by  TVA's 
Technology    Advancement    (TA)  organization 
which   at   the    time    was  called   Research  & 
Development.     The  objective  of  Phase  I  was  to 
develop  a  conceptual  state-of-the-art    design  basis 
and  cost  estimate.     Phase  I  was  completed  in 
February  1994.     The  Phase  I  study  provided  a 
detailed  scope  of  work  and  an  estimate  for  Phase 
II.    Phase  II  was  the  detailed  engineering  and 
design  of  the  project  and  procurement  of  the  DCS 
system.      During  this   phase   of  the  project, 
specifications  were  prepared  by  the  project  team 
for  the  DCS,  simulator  and  long-lead  items.  The 
DCS  contract   and  a  contract    for  design  were 
awarded  to    Foxboro.       Detailed    design  was 
performed   and  long-lead  items   were  procured 
during  this  phase.     During  Phase  III,  the  DCS 
upgrade  was  performed  on  Unit  9,  the  unit  was 
commissioned  and  testing  was  performed  to  gather 
the  data  needed  to  document  the  benefits. 

The  Unit  9  controls  upgrade  consisted  of  a 
complete  renovation  of  the  control  room  and  the 
installation  of  a  CRT-based  control  room 
environment.  The  bench  and  vertical  boards, 
including  the  annunciators,  were  removed  and  the 
floor  openings  filled.  The  control  room  was  re- 
designed to  accommodate  the  CRT  concept.  All 
field  sensors,  actuators,  and  several  of  the  valves 
were  replaced,  including  field  cables.  Flame 
scanners,  high  energy  ignitors,  and  a  new  oil 
supply  system  were  installed  to  provide  additional 
safety  and  turndown  capability.  Secondary  air 
damper  actuators  were  installed  to  provide  the 


capability  to  adjust  the  fuel-air  ratio  at  the  burner 
nozzle  area  for  NOx  control.  The  DCS  system  was 
designed  to  include  network  communications  with 
the  TVA  network  to  facilitate  access  to  the  unit 
performance  and  statistical  data  from  the  plant 
network.     An     EPRI     Performance  Monitor 
Workstation  (PMW)  was  installed  to  provide  real- 
time performance  evaluation  for  the  unit.  The 
majority  of  phase  III  activities  were  completed  by 
the  time  the  unit  was  restarted  in  December  1995. 
The  post  baseline  test  was  performed  to  observe 
the  improvements  due  only  to  the  DCS  upgrade. 
In  addition  to   the    baseline  test    a  series  of 
parametric     tests    were    performed     to  begin 
optimizing  the  unit.   An  experimental  design  was 
used  to  determine  the  combinations  of  conditions 
for  NOx  optimization  testing. 

The  EPRI  I&C  Center  was  established  during  this 
project.  Construction  of  the  I&C  Center  was 
completed  in  August  of  1995.  The  I&C  Center 
was  dedicated  on  February  29,  1996.  The  new 
simulator  was  available  for  operator  training 
during  the  Unit  9  outage  for  the  installation  of  the 
new  controls,  and  all  Unit  9  operators  went 
through  a  5  week  training  course  prior  to  startup. 
The  simulator  is  housed  in  the  I&C  Center 
building. 


PROJECT  RESULTS 


Listed  below  are  the  major  results  of  the  DCS 
retrofit  on  Kingston  Unit  9.  These  benefits  were 
primarily  derived  by  two  methods;  baseline  testing 
and  comparative  before  and  after  operational  data. 
There  were  three  baseline  test  performed.  The 
first  test  performed  was  a  true  baseline  in  that  no 
retrofit  work  or  pulverizer  improvements  had 
been  implemented  prior  to  the  test.  The  second 
baseline  was  performed  after  extensive  fuel 
balancing  work  and  pulverizer  blueprinting  work 
had  been  performed.  The  final  baseline  test  was 
performed  after  the  controls  retrofit  was 
completed.  The  benefits  listed  here  are  the 
benefits  obtained  solely  from  the  DCS  retrofit  and 
independent  of  other  plant  modifications  (e.g., 
pulverizer  and  turbine  modifications),  i.e.  the  delta 
between  the  second  and  final  baseline  test. 
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The  heat  rate  has  improved  by  250 
BTU/KWH  as  a  result  of  the  controls  retrofit 
due  in  part  to  the  following: 

•  a        1.47%        boiler  efficiency 
improvement 

•  an  LOI  reduction  from  ~  6%  to  ~  3% 

•  improved  steam  temperature  controls 

•  a  reduction  in  station  service  usage 

•  reduced  excess  air  (enabled  by  better 
02  control) 

•  closed  loop  primary  air  control. 

This  project  demonstrated  that  this  unit  and 
similar  ABB  CE  units  can  be  retrofitted  with  a 
DCS  and  tuned  to  reduce  NOx  emissions  by 
approximately  25%. 

Due  to  tighter  steam  temperature  controls,  the 
Plant  Forced  Outage  Rate  is  projected  to  be 
reduced  by  one  forced  outage  per  year. 
This  project  has  demonstrated  that  Unit  9 
with  the  new  DCS  is  capable  of  regulation  at 
3%  per  minute  or  better  over  a  50%  load 
range.  Prior  to  the  retrofit,  Unit  9  was  limited 
to  1%  per  minute. 

Improved  steam  temperature   controls  could 
allow  the  unit  to  return  to  steam  temperatures 
of  1025  °F  from  1000°  F,  while  providing 
greater    protection     from     excursions  and 
reduced  boiler  and  turbine  damage.  There  are 
current  negotiations  with  the  plant  operations 
staff    to    raise    these    steam  temperature 
setpoints. 

The  unit  carbon  loss  (LOI)  was  reduced  from 
6%  down  to  3%  This  is  a  measure  of  the 
improvement  in  the  combustion  process  and 
the  addition  of  primary  air  controls  on  the 
pulverizers. 

Startup  time  for  Unit  9  has  been  reduced  by  at 
least  2  hours  due  to  improved  ignitors  and  low- 
load  temperature  controls. 
Velocity  induced  wear  on  fuel  system 
components  from  pulverizer  to  burner  nozzle 
was  reduced  by  35%  because  of  reduced 
primary  air  flow. 

The  environmental  emissions  have  been 
reduced  by  improving  the  unit  heat  rate  and 
burning  approximately  8,000  tons  less  coal 
per  year  on  Unit  9. 

The  life  of  the  Unit  9  has  been  extended  by 
replacing  the  40  year  old  field  cables  and  other 
plant  equipment. 


PROJECT  BENEFITS 


The  benefit  details  are  shown  in  the  Table  1,  and 
indicate  the  planned  vs.  actual  benefits.  Benefits 
on  a  $/Y ear  basis  are  shown.  The  benefits  listed 
are  based  on  an  80%  capacity  factor  and  a  15% 
discount  rate.  Some  of  the  actual  cost  numbers  are 
derived  from  financial  figures  deemed  confidential 
by  the  TVA  and  cannot  be  disclosed. 


LESSONSLEARNED 


Listed  below  are  the  major  lessons  learned  which 
came  out  of  this  project: 

1.  The  Project  Team 

This  team  was  responsible  for  budget,  schedule, 
and  overall   coordination    of  project  activities. 
This  project  was  managed  much  differently  than 
most    TVA   projects.       A  Project    Team  was 
established  and  reported  to  the  Kingston  Plant 
Manager.    The  project  was  managed  with  a  site 
perspective   instead  of  the  usual  central  office 
Design  Engineering  approach.  The  Project  team 
selected  was  directly   involved   in  preparation, 
review,  award  and  management  of  equipment  and 
construction  fixed  price  contracts.   Most  members 
had  a  strong  instrument  or  controls  background. 

The  Project  Team  approach  was  very  successful  in 
controlling  the  project  and  resolving  problems  as 
they  arose.  The  project  was  completed  utilizing 
fixed  price  contracts  and  completed  on  schedule 
with  almost  no  scope  changes.  A  design  package 
has  been  completed  to  be  used  for  future  DCS 
retrofits.  This  will  reduce  the  cost  considerably  for 
design,  construction,  and  implementation  of 
similar  projects  in  the  future.  The  continuity  of 
responsibilities  achieved  by  a  single  party  being 
responsible  for  design,  equipment  supply, 
construction  and  startup  of  the  unit  was  a  very 
valuable  asset. 

2.  Plant  Ownership 

This  project  has  clearly  demonstrated  the  value  of 
plant  involvement   in  every  step  of  the  project 
from    conception    and  justification    to  design, 


397 


construction,  startup  and  benefits  documentation. 
The  level  of  involvement  at  Kingston  was  from 
the    plant    manager   through    the    entire  staff 
including  Trades  and  Labor  personnel  who  assisted 
in  the  checkout  and  startup. 

3.  Operations  and  Graphics  Design 

Plant  operators  designed  the  pictorial  graphics 
that  were  used  on  Unit  9  and  the  Simulator.  The 
use  of  existing  operations  staff  to  build  the  Data 
Acquisition-type  graphics  fostered  considerable 
ownership  of  the  project  from  the  operations 
personnel  at  the  site.  Operators  were  selected  on 
a  voluntary  basis  for  the  initial  start-up  and 
subsequent  operation  of  the  new  system 
Operators  were  encouraged  to  comment  on 
graphics  being  developed  and  suggestions  were 
implemented  when  appropriate.  This  gave 
operators  ownership  and  satisfaction  in  being  part 
of  the  "design." 

4.  Pulverizer  Primary  Air  Flow  Measurement 
And  Control 

Install  primary  air  flow  measuring  and  control 
equipment  on  all  CE  pulverizers.  The  cost 
(approximately  $  1 2k/pulverizer  on  this  project) 
and  specifications  for  each  installation  are  site 
specific  and  will  require  a  study  to  determine  the 
technical  requirements.  The  Unit  9  project  has 
demonstrated  that  primary  air  control  is  a  major 
key  in  the  boiler  optimization  process. 

5.  Furnace  02  Balance 

It  was  decided  to  try  to  control  the  02  leaving 
each     furnace     and     the  windbox-to-furnace 
differential  pressure  using  only  the  secondary  air 
dampers.       This    strategy     has    worked  well 
throughout    the   load  range   with   no  apparent 
disadvantages.    It  is  a  recommendation    of  this 
project  that  any  ABB  CE  Twin-Furnace  unit  with 
automated  secondary  air  dampers  employ  this 
strategy  for  a  more  steady  control  of  02,  better  02 
balance  between  furnaces,  and  good  windbox-to- 
furnace  differential  pressure  control. 

6.  Boiler  Safety/Carbon  Monoxide 
Monitoring 

Install  boiler  carbon  monoxide  (CO)  monitors  on 
all  boilers  that  must  optimize  for  NOx  emissions. 


CO  is  a  better  indicator  of  incomplete  combustion 
than  excess  02  when  operating  at  very  low  02 
levels.  The  unit  must  operate  in  a  region  that  has 
no  margin  for  error. 

7.  Use  of  experimental  design  and  model  to 
expedite  performance  and  NOx  optimization 
testing 

Statistical  experimental  design  techniques  were 
used  to  select  specific  test  points  for  NOx 
minimization  and  plant  optimization  testing. 

8.  Fuel  System  Improvements 

There  were  several  lessons  learned  dealing  with 
pulverizer  optimization.  Coal  pipe  rifflers  play  a 
very  important  role  in  providing  equal  fuel 
distribution  to  all  coal  pipes.  Rifflers  should  be 
checked  for  primary  and  secondary  alignment, 
excessive  wear,  and  size  variations  which  can  cause 
flow  to  bypass  the  rifflers.  All  of  these  items 
affect  fuel  balance  and  unit  optimization  for 
emissions  and  efficiency  control. 

Tests  and  discussions  with  ABB  CE  experts  have 
confirmed  that  air  in-leakage  on  Raymond  bowl 
mills  is  approximately  20  to  25%.   This  is  normal 
on  all  negative  pressure  pulverizers  and  should  be 
accounted  for  in  the  fuel/air  optimization  process. 

During  the  process  of  optimizing  pulverizer 
performance  to  reduce  LOI  and  improve  heat  rate 
and  NOx  emissions,  it  was  pointed  out  that  ABB 
CE  had  recommended  extending  the  pulverizer 
classifier  deflector  ring  to  the  bottom  of  the 
classifier  openings  to  improve  50  mesh  coal 
grinding.  Controlled  test  on  Unit  9  indicated  that 
both  200  and  50  mesh  coal  grinding  was  improved, 
especially  when  the  primary  air  was  controlled 
near  the  correct  fuel  air  ratio  of  1.8  pounds  of  air 
to  1.0  pound  of  coal. 

9.  Improved  Performance  Monitoring 
Package 

It  is  recommended  to  install  an  on-line 
performance  monitor  in  conjunction  with  the  DCS 
retrofit.  On  this  project,  EPRI's  Performance 
Monitoring  Workstation  (PMW)  was  used  and  has 
proven  to  be  an  effective  tool  for  monitoring 
plant  performance.  The  monitoring  software 
should  be  configurable  by  plant  personnel  to  allow 
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additions  and  modifications  without  major 
expense. 


CONCLUSION 


10.  Simulator  -  Training 

The  use  of  the  simulator  provided  an  excellent 
tool  in  the  training  of  the  operators  for  Unit  9. 
The  training  familiarized  the  operators  with  the 
new  CRT  man-machine  interface,  gave  the 
operators  further  opportunity  to  customize  the 
graphics,  and  identified  some  problems  prior  to 
startup  and  enabled  faster  resolution  of  problems 
during  startup. 

1 1 .  Simulator  -  Engineering  Uses 

The  simulator  provided  on  this  project  is  a 
valuable  engineering  tool  in  addition  to  its 
usefulness  as  an  operator  training  system.  There 
are  several  benefits  an  engineering  simulator  can 
provide  to  a  retrofit  project  such  as  this  one. 
First,  it  can  be  used  to  verify  proper  operation  of 
the  control  system  logic  prior  to  the  start  up  of 
the  new  control  system.  Second,  it  can  be  used  to 
test  proposed  modifications  to  the  control  system 
whether  they  are  simple  logic  revisions  or 
completely  new  control  algorithms.  Third,  it  can 
be  used  to  evaluate  plant  performance  and 
proposed  modifications  to  the  plant  mechanical 
systems. 

12.  Marketable  Ash  Product 

Due  to  improved  combustion  control,  the  Unit  9 
flyash  now  has  low  enough  unburned  carbon  to  be 
a  salable  product. 

13.  Plant  Wide  Historian 

Modern  digital  control  and  data  acquisition 
systems  like  the  DCS  installed  on  Unit  9  are 
capable  of  collecting  large  amounts  of  plant 
operating  data.  This  operating  data  is  very 
valuable  in  analyzing  plant  performance  and 
operating  problems.  It  is  recommended  that  all 
DCS  retrofits  include  a  comprehensive  plant 
archive  system  which  can  store  data  from  not 
only  the  DCS  but  also  from  any  other  plant 
system  which  collects  data. 


The  benefit  justification  for  a  typical  DCS  retrofit 
has  been  lacking.  EPRI  and  TVA  initiated  a 
project  to  quantify  the  benefits  of  installing  a  new 
distributed  control  system  on  a  40  year  old  coal 
fired  plant.  The  retrofit  has  been  completed  and 
project  benefits  documentation  is  continuing.  The 
primary  benefits  to  date  are  heat  rate 
improvement,  NOx  emission  reduction,  and 
dispatch  response  improvement.  Evaluation  of 
long  term  benefits  is  continuing  and  will  be 
documented  in  the  future. 
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CT  -  PLANNED  &  ACTUAL  BENEFITS 

Actual  Benefit  Description 

Continue  evaluation  through  2000  i 

I Continue  evaluation  through  2000  I 

1.1  BTU/°F  for  SH  &  RH  =$91,630  I 

3%  reduction  in  LOI:  1%  =  8  BTU:  3  x  8  x  1666  =  $39  984 

1.47%  x  10,000  BTU  =147  BTU:  147  x  1666  =  $244  902 

20,000  Gal/Year  reduction:  20,000  x  0.75  =  $1 5  000  ! 

Evaluation  Incomplete  ! 

35%  Reduction  Yearly  Oi4M  Cost:  0.35  x  $100  000  =  $35  000  f 

Based  on  TVA  capital  project  justification  form  $115  000  i 

Evaluation  To  Continue  through  2000  I 

Improvement:  Handling  =  $6600,  Volume  =  $13  200  I 

! 4  Events  X  $1 ,470/Hour  =  $5,880  I 

Evaluation  Incomplete 

Total  Benefit  in  $  /  Year 

&  CONTROLS  PROJE 

Planned  $/Year 

i                  $8,330  | 

$8,330  ! 

!                 $99,960  j 

$13,328  I 

I                $166,600  j 

!                  $7,500  j 

I                $17,000  | 

$85,000  I 

$166,666  I 

$5,000  I 

$10,000  I 

$100,000 

$687,714 

Table  1 

KINGSTON  FOSSIL  PLANT  INSTRUMENT 

Planned  Benefit  Description 

1%  improvement  =  5  BTU  Savings 

i  1  %  improvement  =  5  BTU  Savings 

25  °F,  SH  &  RH  =  60  BTU  Savings 

11%  Improvement  =  8  BTU  Savings 

j1%  Improvement  =  100  BTU  Savings 

1 10,000  Gallons  /  Year  Savings 

1 10  BTU  Improvements  Savings 

Not  Previously  Accounted  For 

Reduce  Tube  Leaks  from  5  to  4/Year 

Reduce  Solid  Particle  Erosion 

2  Ton/Day  Handling  and  Volume 

2  Hour  Reduction 

40  MW  ImprovementSavings 
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Abstract 

Different  live  beings  are  co-operating  (ants,  wolves,  humans,  etc.) 
inside  closed  social  structures,  which  express  collective  intelligence 
to  solve  their  problems.  Individual  beings,  express  intelligence  for 
their  own  purposes,  as  well  as  for  their  groups,  even  if  some  beings 
temporarily  behave  illogically  or  even  against  their  social  structure. 
Such  social  structures  solve  different  problems  in  parallel,  chaotic 
ways.  Another  observed  phenomenon  is  that  some  problems  are 
partially  solved  unconsciously  in  advance;  thus  when  the  problem 
arises,  the  solution  is  almost  waiting  to  be  applied.  The  phenomenon 
of  collective  intelligence,  also  known  as  swarm  intelligence,  can  be 
considered  a  part  of  semiotics  because  the  essence  of  intelligent  co- 
operation is  either  direct  exchange  of  information  or  observation  and 
evaluation  of  the  activity  signs  of  other  beings.  This  paper  will  try  to 
define  formal  measures  for  collective  intelligence  based  on  the  infer- 
ring ability  of  closed  social  structures.  A  social  structure  will  be 
mapped  into  the  Random  PROLOG  Processor  (RPP),  and  on  this 
basis,  the  efficiency  of  the  N-step  inference  will  be  evaluated  as  the 
IQ  for  the  social  structure  (IQS).  The  proposed  concept  of  1QS  can  be 
developed  in  a  formal  way,  as  well  as  for  practical  evaluation  of  the 
given  grounded  structure.  On  a  simulation  base  we  will  demonstrate 
how  some  properties  of  social  structures  affect  the  value  of  IQS. 

1.  Introduction. 

Let's  assume  that  there  is  a  given  computational _space  (CS) 
filled  with  objects  (agents).  They  take  fixed  positions,  or  per- 
form arbitrary  movement.  Some  agents  (interpreted  as  live 
beings,  computers,  etc.)  can  express  inferring  ability;  other 
which  are  not  able  to  do  this  will  be  interpreted  as  messengers 
or  messages.  It  will  be  demonstrated  in  this  paper  (see  also  [9], 
[10])  that  one  common,  formal  description  can  be  used  for  all 
types  of  agents,  based  on  the  concept  of  the  informa- 
tionjnolecule  (IM)  and  the  membrane,  defined  for  the  first 
time  for  the  Chemical  Abstract  Machine  by  Berry,  et  al.  [2]. 
The  term  IM  is  claimed  here  to  be  more  general  than  the  term 
agent  because  the  concept  of  IMs  of  different  levels,  can  cover 
the  whole  spectrum  of  agents.  A  single  message  can  be  con- 
sidered as  an  agent  without  a  membrane  and  inferring  ability. 
To  create  the  structure/batch  of  messages,  it  is  sufficient  to 
enclose  such  agents  inside  the  membrane.  The  concept  of  the 
membrane  also  allows  us  to  define  the  independent  internal 
structure  of  an  IM,  other  than  the  obligatory  structure  of  an 
exterior.  The  individual  inferring  agent  can  be  considered  as 
the  membrane  enclosing  an  inferring  system  of  any  nature, 
consisting  of  IMs  of  e.g.  facts,  rules,  and  goals  (similar  to  the 
structure  of  the  Expert  System).  This  formal  approach  can  be 
continued  up  through  the  levels  of  social  structures  such  as  the 


village,  city,  etc.  The  top  level  CS,  filled  with  agents,  can  even 
be  considered  a  meta-IM.  In  general,  a  given  IM  is  character- 
ized by  its  inferring  ability,  the  properties  of  it  membrane  - 
which  define  Input/Output  characteristics,  and  its  mobility. 
Zero-mobility  means  that  it  takes  a  fixed  position  in  the  CS; 
non-zero  mobility  must  refer  to  the  characteristic  of  IM 
movements  (e.g.  chaotic  movement,  displacements  along  the 
structure  of  streets,  along  data-pathways  etc.).  The  lowest  level 
of  IM  is  proposed  to  have  the  form  of  logical,  PROLOG 
clauses  of  facts,  rules,  and  goals.  The  lowest  level  of  the  infer- 
ence process  is  proposed  to  be  the  act  of  rendezvous  of  IMs 
followed  by  a  PROLOG-like  inference.  Higher  order  IMs  are 
suitable  to  model,  e.g.  live  beings  by  defining  the  proper 
membrane  incorporating  any  internal  inferring  system  used 
for  modeling  behavior  of,  e.g.  ants.  Higher  level  social  struc- 
tures can  be  defined  as  CS  with  spatial  structure  where  any 
displacements  of  IMs  of  different  complexity  take  place.  The 
proposed  concept  of  CS  filled  with  IMs  is  able  to  model  social 
systems  of  quasi-randomly  moving  insects  e.g.  ants,  which 
communicate  through  the  use  of  pheromones.  We  can  define 
necessary  (static  or  dynamic)  spatial  fields  of  IMs,  carrying 
unit  clauses  (atoms),  corresponding  to  the  information  carried 
by  chemical  molecules  of  pheromones.  The  insects  themselves 
will  be  higher  order  IMs.  Also  structures  of  biological  cells  in 
any  organism/body  which  make  contact  through  hormones, 
can  be  modeled  on  the  concept  of  structured  IMs  and  mem- 
branes. 

An  Intelligence  Quotient  (IQ)  based  on  pattern  recognition 
and  problem  solving  tests  can  be  defined  for  different  beings, 
including  humans.  Tests  can  have  different  parameters  and 
natures,  that  can  verify  certain  components  of  intelligence.  For 
tests  based  on  pattern  recognition  and  problem  solving  ability 
equivalent  logic  inferring  systems  can  be  written,  able  to  solve 
a  given  test  and  to  give  the  number  of  necessary  inference 
steps  [6],  [8]. As  the  IQ  measure  for  closed  social  structures, 
we  propose  to  use  the  N-step  inference.  The  concept  is  that  a 
given  social  structure  will  be  mapped  into  the  RPP,  which  will 
be  described  in  the  second  part.  In  the  RPP  we  can  model  or 
formally  evaluate  N-step  inference  process,  which  will  provide 
us  an  IQS  measure.  It  is  important  that  the  PROLOG  is  used 
as  it  is  based  on  mathematical  logic  of  1 -order  predicate  cal- 
culus. The  different  activities  and  behaviors  of  a  social  system 
can  be  described  through  formalism  of  facts,  rules,  and  goals 
as  an  inferring  system,  without  regard  to  nature  of  the  given 
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behavior.  It  can  be  the  "informal  dance  of  the  honey  bee",  or 
the  output  of  pheromones,  or  a  network  message. 
The  third  part  of  the  paper  will  discuss  N-step  inference  as  a 
Social  IQ  test.  In  the  fourth  part  examples  of  mapping  from 
social  structure  to  the  RPP  will  be  given,  with  simulations  and 
evaluation  of  IQS  for  some  specific  social  structures.  A  simple 
case  will  be  considered  where  a  being  is  approximated  by  a 
single  clause  of  a  fact,  rule,  or  goal.  This  way  the  effect  of  IQS 
on  certain  configurations  and  statistical  factors  of  closed  social 
structures  will  be  verified. 

2.  Virtual  Random  PROLOG  Processor. 
2.1.  RPP  Architecture. 

The  Virtual  Processor  for  the  RPP  is  defined  as  the  CS  using 
the  notation  of  Multisets  [1]  and  cham  [2]. 
The  0-level  CS"  is  defined  as  a  single  IM  of  one  PROLOG 
clause  c,  of  a  fact  or  rule  or  goal.  It  can  only  undergo  internal 
transition.  For  any  CS,  we  can  define  the  membrane  [2]  de- 
noted by  I .  I  which  encloses  inherent  facts,  rules,  and  goals. 
'  'p, 

CS°  has  no  membrane.  It  is  obvious  that 

CS,={c1,..r„}=jc1,..r^} 

For  the  given  kind  of  membrane  its  type  p,  defines  which  IMs 
can  pass  through  it.  Such  an  act  is  considered  as  Input/Output. 
If  the  CS  also  contains  other  CSJ  it  is  then  considered  a  higher 
order  one,  depending  on  the  level  of  internal  CSJ.  Such  inter- 
nal CSj  will  be  also  labeled  with  v;  e.g. 

CS2  =  |cIv..,CS!  ,...,cB|}  iff  cs{  =  jv->M} 
where  b,  i  =  \...m  and  cj  j  =  \...n  are  clauses 
Every  c,  internal  CS  will  be  labeled  with  vy  to  denote  charac- 
teristics of  its  individual  displacements.  The  general  practice 
will  be  that  higher  level  CSs  will  take  fixed  positions,  i.e.  will 
create  structures,  and  lower  level  CSs  will  perform  displace- 
ments. For  a  given  CS  there  is  a  defined  function  pos  (posi- 
tion) of  any  nature: 

pos:  Ot  -»  (position  descriptio  n)  U  undefined 

where  Ot  e  CS 
The  following  examples  of  pos  can  be  given: 

pos(o,,t)  =  (x,y,z) 

where  x,  y,  z  are  fixed  Cartesian  co-ordinates  or  position  at  a 
given  moment  t; 

pos(0„M)=(Ax,Ay,A:) 

where  Ax,  Ay,  Az  are  components  of  the  vector  in  3-D  space 
describing  the  displacement  of  the  element  in  the  given  time 
period  At ; 

pos{o„t  or  At)=  p.(x,y,z) 


where  u  is  any  fuzzy  function  over  Cartesian  space,  defining 
the  position  of  a  given  clause  or  the  internal  CS.  Obviously 
there  are  also  other  possible  definitions  of  pos. 

If  there  are  any  two  internal  CS  objects  0„  Oj  in  the  given  CS, 
then  there  is  a  defined  distance  function 

and  rendezvous  distance  d  e  9?.  We  say  that  during  the  com- 
putational process,  at  any  time  /  or  time  period  At  two  objects 
O,,  Oj  come  to  rendezvous  iff 

r[pos{o),po{o^<d. 

The  rendezvous  act  ®  will  be  denoted  by  the  relation1  e.g. 
Oj  ®  Oj  which  is  reflexive  and  symmetric,  but  not  transitive. 

The  computational  process  for  the  given  CS  is  defined  as  the 
sequence  of  frames  F  labeled  by  /  or  At ,  interpreted  as  the 
time  (given  in  standard  time  units  or  simulation  cycles),  with  a 
well-defined  start  and  end  e.g.  Flfj,...,F,  .  For  every  frame  its 

multiset  Ff  =  (|c,,...,cm|)  is  explicitly  given,  with  all  related 

specifications:  pos(.),  membrane  types  p,  and  movement 
specifications  v. 

The  simplest  case  or  Virtual  Processor  for  the  RPP  is  the  3-D 
space  with  randomly  travelling  clauses  of  facts,  rules,  and 
goals.  Such  a  simple  processor  is  initialized  to  start  the  com- 
putational process,  in  such  a  way  that  the  set  of  clauses,  facts, 
rules,  and  goals  (defined  by  the  programmer)  is  injected  into 
this  CS.  After  some  time,  if  the  proper  form  of  goal  clause  is 
reached  e.g.  c,  =  answer(...).  then  v  ■  of  this  clause  is  automati- 
cally changed  to  provide  migration  to  the  necessary  destina- 
tion, e.g.  to  bottom  of  the  cube.  Later  on,  such  IMs  can  pass 
through  the  CS  membrane  to  the  outside.  In  this  way,  the  ap- 
pearance of  the  solution  of  the  problem,  anywhere  in  CS  is 
observable.  More  advanced  examples  of  the  Virtual  Processor 
for  the  RPP  are  e.g.  a  single  main  CS2  with  a  set  of  internal 
CS'  which  take  fixed  positions  inside  CS2,  and  a  number  of 

CS°  who  are  either  local  for  a  given  GS,1  (because  the  mem- 
brane is  not  transparent  for  them),  or  global  for  any  subset  of 

CS)  e  CS2. 

When  modeling  the  inference  power  of  closed  social  struc- 
tures, interpretations  in  the  structure  will  be  given  for  all 

CS™  i.e.  "this  CS  is  a  message";  "this  is  a  single  being"; 

"this  is  a  village,  a  city",  etc.  At  the  end  of  this  section,  the 

importance  of  properly  defining  v   for  every  CS'j  should  be 

emphasized.  As  has  been  mentioned,  the  higher  level  CS'j 


1  For  another  definition  of  rendezvous  as  the  ^.-operator,  see 
Fontana,  et  al  [3]. 
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will  take  a  fixed  position  to  model  substructures  like  villages 
or  cities.  If  we  model  a  single  being  (e.g.  human)  as  CSj , 
then  Vj  will  reflect  displacement  of  the  being.  Characteristics 
of  the  given  v  ■  can  be  purely  Brownian  or  can  be  quasi- 
random,  e.g.  in  lattice,  but  from  the  computational  point  of 
view,  it  is  profitable  to  subject  it  to  the  present  form  of  CS'j . 

With  the  proper  characteristics  of  Vj ,  the  following  can  be 
provided: 

•  migration  of  the  goal  clause,  when  it  reaches  the  final  form 
answerf...).  toward  the  Output  location.  This  can  be  a 
membrane  of  the  main  CS  or  even  a  specific,  local  CS; 

•  temporarily,  the  density  of  some  IMs  can  be  increased  in 
the  given  area  of  a  main  CS  in  such  a  way  that  after  the 

given  low-level  CS'j  reaches  the  necessary  form,  it  mi- 
grates to  specific  area(s).  This  will  significantly  increase 
the  speed  of  inference. 

2.2.  Language  of  Random  PROLOG. 

The  RPP  language  is  elementary,  but  elements  from  attribute 
grammars  are  necessary,  as  well  as  the  redefinition  of  rule  and 
goal  clause.  The  rule  and  goal  clause  in  the  RPP  is  an  unor- 
dered set  5(  )  of  unit  clauses,  contrary  to  the  standard 
PROLOG,  where  it  is  the  ordered  list  of  unit  clauses.  Such  a 
set  can  even  be  considered  as  a  local,  invariable  CS  of  rule  or 
fact,  containing  unit  clauses.  This  approach  is  compatible  with 
the  general  philosophy  of  the  RPP,  and  introduces  additional 
nondeterminism  of  inferences  on  the  level  of  unification.  In 
the  RPP,  every  fact,  rule,  and  goal  is  associated  with  two  in- 
herited attributes:  position  pos  and  probability  P  of  occurrence 
at  the  given  state  of  computations.  During  the  simulations,  the 
value  of  certain  attributes  is  very  attractive,  e.g.  the  probability 
value  that  the  given  inference  will  be  successfully  done.  Our 
experiments  demonstrate  that  in  practice  it  is  difficult  or  even 
impossible  to  calculate  P  for  more  complicated  cases.  Only 
general  conclusions  can  be  reached  in  a  formal  way.  For  some 
simulations  it  will  be  profitable  to  introduce  one  more  attrib- 
ute: configuration,  to  define  the  shape  and  to  order  in  any  way 
(linearly  or  even  spatially)  the  unit  clauses  in  the  £(  ).  This 
will  provide  an  easy  instantiation  of  the  RPP  clause  to  the 
linearly  ordered  clause  of  standard  PROLOG. 

The  general  pattern  of  inference  in  Random  PROLOG  gener- 
alized for  any  CS  has  the  form: 

CS'j  ®  CSf  ,  ;  rendezvous  exists 

U(  CS'j  ,  CS* ),  ;  check  unification 

C(one  or  more  CS™  of  conclusion)        ;  check  if  satisfiable 

 "   h 

K(  CS'j  or  CSt  ) ;  retract  parent  IM  if  necessary. 


In  general,  successful  rendezvous  can  result  in  the  birth  of  one 
or  more  child  IMs.  All  of  them  must  then  fulfil  a  C(  )  condi- 
tion; otherwise  they  are  aborted.  Because  our  system  is  based 
on  inferring  in  logic,  the  symbol  of  reaction  ->  used  in  cham 
semantics  is  replaced  with  |-  (inference) . 
Our  proposed  implementation  of  the  RPP  is  tuned  to  evaluate 
the  inference  power  of  closed  social  structures.  It  was  neces- 
sary to  make  some  simplifying  assumptions.  Rendezvous  and 

direct  inference  between  two  CS'j  if  i>\  will  be  left  for 

further  research.  Thus  this  paper  deals  with  a  single  CS'  as  the 
main  CS  only.  The  example  of  multiple  lower  level  subspaces 
CS'  and  clauses  CS°  existing,  being  displaced,  and  inferring  in 
a  given  higher  level  CS2  makes  it  possible  to  model  multiple 
beings  H,  performing  internal  inferences  (in  their  brains), 
independently  to  higher  level,  co-operative  inferences  inside 
CS2  and  exchange  of  messages  of  type  CS°. 
It  is  also  important  to  assume  that  the  results  (products)  of 
inference  after  this  are  not  allowed  to  infer  between  them- 
selves. Products  of  inference  must  immediately  disperse;  how- 
ever, later  on,  inferences  are  allowed  -  if  they  rendezvous. 

Now,  let's  discuss  the  example  type  of  inferences  between 
clauses,  when  the  rendezvous  takes  place. 

Inference:  fact  with  rule 

a(l)  .  ®     answer(Y)    :-  a(X), 

(fact)  (rule)  greater  (X,  0), 

sum(X,   5,  Y) 
retract  (  a (X) . ) , 
assert ( 

message  (  "molecule  a(X).   killed").).  |- 
answer  ( 6 )  .  (logical  result) 

Therefore,  after  the  rendezvous  the  following  processing  takes 
place: 

step  1:  matching; 

unification  of  clauses  a  ( 1 )  and  a  (X)  =>  {X71 } 

step  2:  evaluation 

logical  evaluation  of  greater  (1,  0)  =>  True 
numerical  calculation  of  sum (1, 5,  Y)  =>  True 

step  3:    retract  and  assert  control  actions: 

retract  one  parent;  fire  the  message  molecule  into  the  RPP; 

step  4:     child  molecule  is  produced 

logical  expression  (parent  rule)  reduces  to  unit  clause  i.e. 
fact  answer  (6)  .  (because  there  are  no  evaluations  into 
False  and  all  built-in  predicates  are  executed  successfully). 
An  IM  answer  (6)  .  can  be  considered  a  legal  child.  The 
message  molecule  is  the  "background  child". 

The  syntax  of  a  PROLOG  dialect  used  for  RPP  resembles  a 
simple  PROLOG.  Nevertheless,  a  quite  different  execution 
model  yields  new  possibilities  of  inference  diagrams,  as  well 
as  creating  new  demands  for  built-in  clauses,  especially  for 
control  clauses.  In  the  standard  PROLOG  execution  model 
(e.g.  WAM)  we  are  restricted  to  backward  inference,  starting 
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from  a  goal  [4],  [5],  [7],  [11].  The  nature  of  the  RPP  is  ran- 
dom or  quasi-random,  with  parallel  rendezvous  of  IMs  carry- 
ing clauses,  so  it  is  obvious  that  it  is  possible  to  implement 
more  inferring  diagrams  between  facts,  rules,  and  goals.  We 
can  have  backward,  forward,  and  aggregated  inferences  (i.e. 
rule  with  rule)  at  the  same  time. 

Remarks: 

•  simulation  experiments  for  the  N-step  inference  have  dem- 
onstrated the  power  of  a  rule-rule  inference  diagram.  Rules 
infer  together  building  "aggregates"  which  are  more  pow- 
erful rules  (observed  also  in  GAMMA  [1]).  As  a  result, 
suddenly  the  final  solution  can  be  generated  on  the  basis  of 
e.g.  a  starting  fact  and  such  an  aggregated  rule; 

•  in  the  RPP  when  a  clause  is  produced  in  the  form  an- 
swer (...),  parameter  Vj  is  changed  in  such  a  way  that 

this  IM  moves  toward  assumed  Output  of  the  RPP; 

•  there  is  no  restriction  on  how  many  copies  of  a  given  fact 
or  rule  or  goal  we  can  have  in  the  RPP.  This  is  a  very  use- 
ful tool  to  speed  up  some  inferences. 

3.  N-STEP  INFERENCE  FOR  IQS  MEASURE. 

The  basic  question  is  to  find  proper  IQ  tests  for  closed  systems 
HS  of  beings  H.  Different  beings  express  intelligence  through 
different  actions:  e.g.  ability  to  find  a  path  in  a  maze.  It  can  be 
expressed  as  inferences,  in  a  similar  way  as  we  simulate  intel- 
ligence of  Human  through  structure  of  Expert  Systems.  The 
number  of  inferences  per  given  period  of  time  as  the  IQS  test 
is  not  sufficient.  It  is  easy  to  find  examples  of  situations  when 
the  frequency  of  inferences  is  high,  but  the  desired  final  con- 
clusion is  not  reached  at  all  (e.g.  infinite  loops).  The  bench- 
mark which  is  proposed  in  this  paper  is  the  N-step  inference 
(NSI).  It  can  be  defined  as  follows: 

Definition:  N-step  inference  (NSI) 

There  is  a  given  1 -level  CS  CS1  =  |c,,...cn|  such  that  the 
following  are  defined 

•  D^pos^O,}, po^Oj  jj  — >  9?  and  rendezvous  distance 

d; 

•  v  j  =  \...n  i.e.  characteristics  of  random  displacements; 

•  in  CS1  =  (c,,..,.cn]  there  exists  a  subset  of  clauses  (one 

fact,     some     rules,     one     goal)     of     the  form 

{a0,  ax  :  -a0,  a2  :-<*,,  ...,a„  :  -an_x}.  The  number  of 

copies  of  selected  Cj  is  one,  to  make  the  benchmark  more 
rigorous. 

•  The  IQS  is  measured  by  the  probability  P  that  after  m 
frames  F,  the  conclusion  a„  will  be  reached,  from  starting 

fact  a0.  This  is  denoted  IQS  =  Pnlstep"" .  If  we  use  experi- 


mental evaluation,  then  the  average  number  of  frames  m 
is  given2,  after  which  a„  is  reached. 

Justification  for  this  benchmark  is  as  follows. 

1 .  N-step  inference  can  be  very  broadly  interpreted  as: 

•  any  problem-solving  process  in  HS  or  inside  a  single  H, 
where  N  inferences  (logical  or  computational  steps)  are 
necessary  to  get  a  result; 

•  any  production  process,  where  N-technologies/elements 
have  to  be  found  in  HS  and  unified  into  one  final  tech- 
nology or  production  etc. 

2.  Simulating  N-step  inference  in  the  RPP,  we  can  model  the 
situation  in  a  very  natural  way,  where  the  resources  for  in- 
ference (represented  by  starting  fact,  rules,  goal)  are  dissi- 
pated, moving,  (or  locally  concentrated)  around  the  CS. 
This  reflects  well  dissipated,  moving,  or  concentrated  re- 
sources in  the  village,  city,  or  nation; 

3.  With  this  benchmark  very  easy  cases  can  be  simulated 
where  some  elements  of  the  inference  chain  (missing  facts, 
rules)  are  not  available  at  the  moment.  This  means  that  the 
chain  of  inference  is  temporarily  broken,  but  after  a  certain 
time  period,  another  inference  process  (e.g.  which  is  run- 
ning in  the  background  or  parallel)  will  produce  the  miss- 
ing component.  Such  situations  are  well  known  in  Human 
Systems,  e.g.  when  a  given  research  or  technological  dis- 
covery is  blocked  until  missing  e.g.  theorems  or  sub- 
technology  is  discovered/produced/found. 

4.  HS  of  Humans  infer  in  all  directions  i.e.  forward  (e.g. 
improvements  of  existing  technology),  backward  (e.g. 
searching  how  to  manufacture  given  product  going  back 
from  the  known  formula)  and  also  through  generalization, 
e.g.  two  or  more  technologies  can  be  combined  into  one 
more  general  and  powerful  technology  or  algorithm.  N- 
step  inference  simulated  in  the  RPP  reflects  all  these  cases 
very  well. 

4.  Evaluating  the  inference  power  of 
closed  social  structures. 

Now  let's  investigate  with  the  help  of  the  proposed  IQS 
benchmark  some  very  simple  problems  referred  to  within  the 
HS. 

Phenomenon  1:  City 

Suppose  that  a  CS  is  given,  and  IQS  is  a  10-step  inference. 
Let's  analyze  how  the  speed  of  NSI  is  affected,  if  the  city  is 
defined  in  the  CS  as  any  privileged  area.  It  is  difficult  to  de- 
fine what  the  city  really  is  even  for  humans;  thus  the  simple 
intuitive  concept  is  applied  for  these  experiments,  where  the 
HS  consists  of  a  highly  populated  area  (city)  surrounded  by  a 
lowly  populated  region.  For  our  experiment  we  define  city  as 


2  In  computer  simulations,  every  frame  F  corresponds  to  one 
cycle  of  recalculation  of  situation  in  CS. 
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the  local  CS  c  CS  with  a  fixed  position  at  (0,0,0),  and 

the  radius  R  =  0.2  while  the  whole  CS  is  the  cube  lxlxl. 
The  membrane  for  CS'  is  defined  in  such  a  way  that  IMs  after 
entering  the  (city)  CS'  must  stay  there  for  a  period  of  a  fixed 
number  of  cycles  I  (the  membrane  will  reflect  back  into  the 
city  all  molecules  staying  there  fewer  than  presumed  number 
of  cycles),  and  after  this  period  the  IM  is  allowed  to  depart 
through  the  membrane  to  the  outside.  In  the  city,  the  proper- 
ties of  random  walking  should  be  different;  thus,  the  variance 
of  Gaussian  distribution  of  speed  is  set  to  v  =  R 1 2  (half  of 
city  radius  R).  The  city  will  speed  up  the  NS1  in  a  surprising 
way,  what  is  demonstrated  in  Table  1.  The  conclusion  from 
the  simulations  confirms  what  looks  trivial  and  obvious  to  us: 
that  the  existence  of  the  city  is  profitable  outside  of  any  dis- 
cussion, at  least  from  the  speed  of  inference  point  of  view. 
Now  with  the  RPP  and  NSI  we  have  tools  to  analyze  this  phe- 
nomenon, and  immediately  we  can  raise  the  following  analyti- 
cal problems: 

•  the  optimal  size  of  the  city  as  the  function  of  population 
size,  d,  location  of  the  city  in  CS,  parameters  of  movement 
of  IMs,  etc.; 

•  the  better  option  for  HS:  a  single  city  or  a  structure  of  cities 
-  which  type  of  cities  is  optimal?; 

•  how  far  the  city  will  reduce  the  v  (which  can  be  recalcu- 
lated into  energy  savings  in  structures  of  beings)  if  the  city 
is  introduced,  assuming  the  same  NSI  speed  as  before; 

•  etc.. 


Rendezvous  distance  d  =  0.025  ;  Size  of  CS:  1  x  1  x  1 ; 

size  of  CS' :  sphere  at  (0,0,0)  R  =  0.2  ; 

Brownian  movements:  Gaussian  distribution  ! 

with  variance  v=0.124,  u=0; 

NSI=10-step  inference; 

CS  without  City 

with  City 

(50  cycles 
inside) 

Conclusion  reached  after  cycles 

17643 

1436 

Final  number  of  IMs  in  CS 

69 

74 

Table  1  Influence  of  City  to  speed  of  NSI 

Phenomenon  2:  Ability  to  travel/communicate 

The  average  global  speed  of  molecules  (vglohalj  can  be  easily 

interpreted  as  the  average  ability  to  travel  and  to  communi- 
cate. If  we  want  to  simulate  the  ability  to  communicate  (ex- 
change information),  it  is  necessary  to  introduce  a  special  type 
of  IM  M,  where  M  *  H  interpreted  as  the  messages  H  ->  it .  By 
definition  M  has  very  high  (compared  to  being)  ability  to 

travel  i.e.  (vM)»  (vH).  As  a  result,  the  (vgto6a/)  will 

increase  in  the  CS  simulating  given  HS  to  a  high  level.  In  the 


HS  systems  with  restricted  communication  abilities,  the 

(vg/ofto/)  w'"  t>e  mucn  lower  because  the  (vH}  component 
will  prevail. 

Let's  investigate  this  problem  with  the  help  of  a  simulation  of 
NSI  =  10-steps  in  the  simple  CS  (without  the  city).  The  exis- 
tence of  the  city  highly  compensates  for  the  lack  of  ability  for 
distance  travelling  or  remote  exchange  of  information,  because 
"everything  is  nearby". 

The  general  conclusion  that  higher  (v)  gives  higher  speed  of 
NSI  (see  Table  2.)  is  not  very  revealing,  but  with  the  RPP 
model  of  computations  we  can  evaluate  this  conclusion  toward 
practical  applications  e.g.: 

"how  introductions  of  the  remote  communication  system  e.g. 
telephone  system,  affect  speed  of  inference  measured  with  the 
help  of  NSI  in  the  given  HS".  Statistical  data  like  "the  number 
of  trips  and  the  average  distance  per  human  per  year",  "the 
number  of  telephone  calls  and  the  average  distance  of  the  call 
per  person  per  year"  should  be  easily  available; 


Rendezvous  distance  d  =  0.025 ;  Size  of  CS:  lxlxl; 

no  City 

Brownian  movements:  Gaussian  distribution  with  u=0; 
NSI=10-step  inference; 


variance 


Conclusion  reached 
after  cycles  


Final  number  of  IMs  in  CS 


<->  = 
0.124 


17643 


69 


0.062 


28265 


44 


(v)  = 
0.031 


45029 


97 


Table  2  Influence  of  average  speed  (v)  to  speed  of  NSI. 

Phenomenon  3:  The  problem  of  Opposition,  (ability 
to  infer  in  logically  inconsistent  environments). 

One  of  the  phenomenon  of  Social  Structures  HS  is  opposition. 
It  can  be  any  action  of  being(s)  with  individual  interest  di- 
rected against  the  global  interest  of  HS.  It  can  be  also  uncon- 
scious wrong  doing  of  being(s)  (caused  by  the  lack  of  neces- 
sary information),  or  even  technical  failure/accidents.  With 
help  of  mathematical  logic  we  can  express  such  a  situations 
with  an  inconsistent  inferring  system.  Simple  case  of  opposi- 
tion doing  can  be  expressed  in  such  a  way  that  negated  con- 
clusions are  made  in  parallel  to  correct  ones  e.g.  in  different 
areas  of  CS.  Rules  a=>b ;  a=>-.b  can  be  used  to  get  contra- 
dictory results.  Another  possibility  is  that  if  the  global  interest 
of  HS  is  to  reach  a  given  goal  as  soon  as  possible,  through  a 
given  chain  of  inference,  some  being(s)  H  will  always  go  back 
to  reverse/obstruct  these  inferences.  In  the  Fig.  1  IL  loops 
represent  such  a  situations.  In  our  simulations,  for  compensa- 
tion purposes  it  has  been  be  assumed  that  favorable  events  also 
exist  in  the  inference  process,  marked  with  J,  to  model  situa- 
tions where  some  conclusions  have  been  done  "way  ahead".  It 
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is  interesting  to  demonstrate,  that  chaotic  inferring  systems 
can  exist  and  work  well,  even  with  opposition,  but  under  the 
condition  that  stabilizing  measures  are  applied  to  reduce  com- 


j 

i] 

v  r  \ 

— >. 
•< 

-,a, 

fact 
a. 


.IL 


1L  IL 


IL 


a, 
a, 

-  a* 
a, 

a, 
a, 
goal 


Fig.  1 .  System  with  infinite  loops  and  inconsistent  inferences. 


binatorial  explosions  of  inferences.  This  is  also  the  genera 
problem  of  other  inferring  systems,  when  incorrect  heuristics 
are  applied.  Systems  HS  with  opposition  can't  exist  at  all  with- 
out introducing  tough  stabilizing  measures  to  control  the 
number  of  IMs.  The  alternative  is  immediate  combinatorial 
explosion  and  "death"  of  any  such  inferring  system.  The  sim- 
plest proposal  to  reduce  the  combinatorial  explosion  is  based 
on  the  assumption  that  every  new  conclusion  will  retract  an 
old  one  (e.g.  parent(s));  which  can  be  freely  interpreted  in  real 
life  as  exchange  of  generations  of  beings.  The  problem  of 
inferences  a=>b;  a=>-ib  can  be  stabilized  in  such  a  way  that 
when  there  are  rendezvous  of  IMs  like  b  ® — -ib,  the  result 
will  be  retraction  of  both.  The  following  rule  can  be  used: 

retract  (  X(Y)  .  )  ,  retract  (-iX(Y)  . )  :- 

molecule  (X(Y)  .  )  , molecule  (-.X  (Y)  .  )  . 

In  Fig.  1 .  the  main  point  of  interest  are  infinite  loops  caused 
by  certain  types  of  opposition  inside  HS,  and  how  effective  the 
proposed  stabilizing  measures  are.  Thus  the  negative  infer- 
ences i.e.  a=>-ib  are  left  dangling. 


Size  of  CS:  lxlxl;  rendezvous  distance  d  - 

=  0.025 ;  ; 

no  City 

Brownian  movements:  Gaussian  distribution  with  u.=0; 

(v>=  0.124 

NSI=10-step  inconsistent  inference; 

Conclusion  reached  after  cycles 

25436 

Final  number  of  IMs  in  CS 

108 

Table  3.  Influence  of  opposition  inferences  to  speed  of  NSI. 


5.  Conclusion 

This  paper  presents  the  concept  of  Intelligence  Quotient  IQS 
for  Social  Structure,  of  any  nature.  As  the  theoretical  base,  the 
chaotic,  molecular  model  of  computations  has  been  used  - 
which  can  be  considered  as  implementation  of  the  NonDeter- 
ministic  Turing  Machine.  For  programming  purposes,  a  dia- 
lect of  the  PROLOG  is  used,  because  the  logic  of  predicate 
calculus  gives  the  best  chance  to  describe  any  different  behav- 
iors inside  social  structures  which  can  be  considered  as  infer- 
ring processes.  The  simulation  system  has  been  built  in  paral- 
lel C  for  an  8-processor  SGI  computer,  with  3-D  graphics  to 
observe  chaotic  parallel  inferring  processes.  With  this  tool  our 
simulations  were  implemented.  The  presented  research  results 
give  clear  directions  for  future  research  and  intended  commer- 
cial application: 

•  Different  HS  can  be  compared  for  their  IQS  values  and 
ordered,  both  theoretically  and  through  simulations; 

•  How  certain  social,  political  and  technical  factors  affect  the 
IQS  can  be  analyzed  for  a  given  HS; 

•  The  proposed  approach  gives  a  way  to  define  a  formal 
theory  of  IQS  for  HS; 

•  Further  studies  are  necessary  on  the  methodology  of,  how 
the  behavior  of  some  non-Human  beings  (bees,  ants  etc.) 
should  be  translated  into  facts,  rules,  and  goals,  which  sub- 
sequently could  be  used  for  IQS  evaluation. 
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Abstract:  A  view  point  is  presented  that 
emergence  is  a  fundamental  property  of  open 
complex  systems  and  thus  that  natural  language 
understanding  by  humans  depends  on  the 
emergence  of  understanding.  Human 
understanding  is  based  on  the  assembly  of 
components  from  implicit  memories. 
Autonomous  understanding  of  targets  of 
investigation,  by  computational  systems,  depend 
critically  on  the  extraction  of  features  from  the 
target  and  the  use  of  logio-linguistic  principles  to 
encode  structural  characteristics  of  these 
features.  It  is  argued  that  linguistic  principles 
reflect  general  systems  properties  because  the 
underlying  morphology  of  concept  formation 
must  conform  to  a  class  of  general  systems 
properties.  Also,  linguistic  principles  reflect 
general  systems  properties  that  occur  in  the 
world.  New  logical  principles  are  required  to 
design  and  implement  machine  based 
autonomous  investigations  of  natural  systems. 

Introduction:  Neural  systems  derive  implicit 
structure  and  encode  structural  representation  of 
this  structure  into  a  physical  substrates  of  the 
brain  (Schacter  &  Tulving,  1995).  Through 
adaptation,  the  components  of  the  substrate 
express  emergent  neural  phenomena  that  have 
correspondence  between  the  structural  invariants 
of  the  world  experienced  and  internal  mental 
event  that  represent  this  experience  to  the  higher 
order  processing  centers  of  the  brain  (Pribram, 
1991;  Levine  et  al,  1993;  Prueitt,  1997). 

The  phenomena  of  emergence  goes  to 
the  issue  of  how  knowledge  can  be  extracted 
from  a  data  source,  because  the  extraction  must 
make  correlations  over  time  and  develop  a 
flexible  mapping  between  internal 
representations  and  external  objects  and  agents 
(Michalski,  1994).  The  computational  system 
must  have  an  "understanding"  of  how  objects  are 
composed  from  a  substrate  and  how  they 
function  over  time.  This  understanding  can  be 
"engineered"  into  software  and  hardware; 
however,  the  situational  analysis  that  is  to  be 
derived  must  have  some  of  the  same  properties  of 
openness  as  do  natural  systems. 


Open  systems  theory  extends 
engineering  and  close  loop  control  theory  into  a 
new  class  of  logics  similar  to  quantum  logic  and 
the  logics  of  C.  S.  Peirce.  These  new  logics 
reflect  several  classes  of  open  systems 
properties.  The  permutation  of  systems  into 
each  other  produces  one  class  of  properties, 
however,  the  focus  of  this  paper  is  on  the 
features  of  assembly/disassembly  processes. 

The  logics  of  situational  analysis  for 
complex  natural  systems  must  have  formal 
means  to  describe  how  constituents  are 
assembled  into  operational  wholes  and  how 
operational  wholes  are  disassembled  into 
components.  It  is  only  when  these  properties  are 
part  of  the  computational  paradigm  that  one  is 
entitled  to  talk  about  autonomous  intelligence. 

This  means  that  formal  tools  for 
describing  assembly,  degeneracy  and 
indeterminacy  must  be  part  of  the  computational 
logic  (Tzvetkova,  1995).  The  interface  between 

a  human  user  and  a  computational  system  will 
have  to  use  such  tools  to  organize  information 
and  provide  a  pragmatic  axis  to  situational 
analysis  (Prueitt,  1997).  This  means  that 
theoretical/experimental  results  from  general 
systems  theory  must  inform  the  architectural 
issues  regarding  the  design  of  the  human/ 
computer  interface  and  the  issue  of  machine 
understanding  of  natural  language. 

A  central  issue  facing  general  systems 
theory  is  about  whether  or  not  a  class  of 
transformations  can  be  characterized  using 
methods  more  powerful  than  numerical  models. 
In  fact,  Zabezhailo  et  al  (1995)  demonstrates  that 
a  predictive  theory  of  biochemistry  is  realizable 
by  using  special  logics  that  perform  iconic 
computations.  These  computations  are  carried 
out  using  the  special  inference  operators  of  Quasi 
Axiomatic  Theory  (QAT)  (Finn,  1991,  1995). 
The  paths  in  "iconic"  space  are  analogous  to 
trajectories  in  numerical  state  spaces;  however, 
these  iconic  trajectories  are  lawfully  constrained 
by  the  information  in  a  database  containing  the 
results  of  specific  analysis  of  biochemical 
structural  activity  relationships.  Pospelov  (1986; 
1995)  refers  to  these  trajectories  as  syntagmatic 
chains. 
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As  another  example,  the  relationship 
between  canalization  in  simplicial  complexes 
(Johnson,  1996)  and  structural  adacity  (Burch, 
1989)  in  artificial  life  systems  provide  a  means 
for  the  simulation  of  some  of  the  mechanisms 
that  are  enfolding  the  individual  experience  of 
computational  agents  in  a  complex  artificial 
ecosystem.  However,  much  more  needs  to  be 
done  with  artificial  life  systems  before  they  will 
provide  adequate  models  of  the  properties  of 
natural  open  systems. 

Natural  and  Artificial  Systems:  In  (Prueitt, 
1996a)  the  author  postulated  a  fundamental 
difference  between  a  natural  system  and  a 
computational  system.  It  is  postulated  that  the 
difference  is  so  great  as  to  identify  an 
oppositional  scale  with  "natural  system"  at  one 
end  and  "computational  system"  at  the  other.  As 
in  all  oppositional  scales  the  midpoint  is  most 
interesting  because  it  is  at  this  point  that  the 
classification  of  a  system  as  computational  or 
natural  breaks  down  completely  (Pospelov, 
1994).  Figure  1  identifies  this  midpoint  as  a 
perceptual  system.  Language  systems  and  iconic 
systems  are  born  from  complementarity,  with 
iconic  systems  more  computational  like  and 
language  systems  more  similar  to  nature. 


Language  System 


Iconic  System 

Figure  1:  A  semantic  oppositional  scale  between  natural 
systems  and  computational  systems 

Semiotic  knowledge  bases  are  indexed 
to  numerical  computations,  as  are  computer 
implementations  of  structural  adacity  and 
canalization.  Engineering  is  in  total  control, 
except  to  the  extent  that  some  critical  issues 
regarding  how  things  are  composed  might  be  left 
to  the  decision  of  a  natural  system  in  interaction 
with  the  computer.  But  seamless  human/ 
computer  interaction  supporting  situational 
analysis  has  been  very  difficult  to  achieve. 

How  then  do  we  actually  get  out  of  the 
domination  of  computational  processes  in  the 
development  of  situational  models.  The  answer 
may  be  in  the  definition  of  situationally  specific 
"intermediate"  languages  as  an  aid  to  translation 
between  the  computer  and  the  natural  system. 


These  transitional  tools  will  map  linguistic 
descriptions  of  situations  to  formal  systems  and 
computational  inference  engines. 

A  semantic  oppositional  scale  between 
natural  systems  and  computational  systems 
suggest  that  human  perception  uses 
computational  like  iconic  representations  (mental 
images)  as  well  as  language  to  describe  the 
world.  For  example,  we  observe  the  cognitive 
act  of  counting  as  a  singular  process  only 
because  this  act  is  already  deeply  embedded  into 
mature  neural  processes.  Moreover,  the 
cognitive  act  of  counting  is  but  one  manifestation 
of  the  mechanisms  of  mental  imaging.  These 
mechanisms  include  tools  for  measurement  of 
new  observables  by  living  systems.  In  a  generic 
sense,  these  mechanisms  function  as  part  of  a 
process  cycle  expressed  as  action  followed  by 
perception. 

Ecological  processes:  Action  perception  cycles 
are  emergent  from  the  physical  properties  of 
subsystems  and  are  driven  by  a  class  of  periodic 
forcing  functions.  The  scholarly  research  in 
ecological  psychology  and  ecological  physics 
views  periodic  forcing  is  an  essential  part  of  the 
temporal  stratification  of  biological  organization 
into  levels  (see  Prueitt,  1997  for  review).  In  this 
view,  any  one  level  of  organization  contains 
objects  that  are  emergent  constructions  from  a 
relatively  stable  substrate  of  basic  elements. 


Figure  2:  Natural  systems  stratify 


The  relationship  that  must  always  exist 
between  a  substrate  and  a  natural  system  is  the 
critical  issue  that  must  be  resolved  in  any 
situational  analysis  of  natural  systems.  In  Figure 
2  each  level,  Lj  ,  has  the  capability  of  forming  a 
substrate  from  which  new  organizational  wholes 
may  emerge.  Each  of  these  levels,  and  their 
interaction,  can  be  partially  simulated  by  a 
thermodynamical  model. 

Natural  systems  are  "open"  systems 
with  varying  degrees  of  openness.  For  example, 
enzymes  produce  micro  environments  where 
constraints  are  placed  on  the  "meaning"  of 
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assembled  biochemical  agents,  and  in  the 
hippocampus  micro  environments  fuse  the 
associational  traces  of  implicit  memory  into 
compartmental  invariances  that  are  "judged"  by 
the  processes  of  other  limbic  systems  and  of 
cortical  regions.  These  micro  environments  are 
agile  semi-closed  flexible  and  capable  of 
responding  to  novelty  and  nonstationarity  in  the 
environment. 

The  stratified  thermodynamical  model 
of  open  systems,  in  Figure  3,  have  micro 
environments  that  emerge  through  some  coherent 
aggregation  of  compatible  processes.  The  model 
can  be  used  under  the  assumption  that  the 
average  life  time  of  processes  at  a  certain  level  is 
an  order  of  many  magnitudes  different  from  the 
average  life  time  of  the  processes  at  the  next 
level.  For  example,  atoms  form  one  level  and 
chemicals  form  a  different  level. 


H|(x,x)  -  K(x)  +  P(x)  +  D(x,x)  +  E(x,x) 

Figure  3:  Stratified  thermodynamical  model  of  open 
systems,  where  one  of  the  levels  has  collapsed. 

The  collapse  of  one  of  the  micro 
environments  leads  to  the  diffusion  of  material 
into  an  environment  that  supports  the  reassemble 
of  this  material,  and  other  similar  material,  into 
new  micro  environments.  As  these  new 
environments  arise,  the  structure  and  function 
reflect  the  characteristics  of  the  material  as  well 
as  the  state  of  the  environment. 

Application  to   Machine   Intelligence:  Our 

approach  to  machine  translation  and  knowledge 
representation  assumes  the  existence  of  a  table 
(database)  where  the  system  states,  i.e.  those 
states  that  a  compartment  can  assume,  are  all 
specified  and  related  to  a  database  of  subfeatures. 
We  have  a  system  that  is  specified  in  a  formal 
fashion  and  which  is  computable.  In  this  case,  a 
theory  of  the  world  is  provided  in  the  form  of 
computer  program,  with  data,  as  a 
"computational  ontology". 

Within  the  compartmental  boundary  of 
this    program,    an    underlying    ontology,  as 


expressed  in  a  semantic  net  or  table,  can  assume 
different  system  states  and  thus  captures  how  the 
sense  of  the  terms  may  drift.  The  rules  that 
govern  an  ontology  allow  a  modification  of  the 
sense  of  the  terms.  This  gives  us  hope  that  issues 
of  interlingua  and  pragmatic  type  can  be 
addressed  within  individual  compartments 
(Citkina,  1996).  However,  the  procedures  for  the 
formation  and  dissolution  of  these  programs  must 
allow  for  the  limitations  that  formal  systems  have 
in  general  and  that  the  specific  program  has  in 
particular.  How  is  this  accomplished? 

Machine  intelligence  can  also  have  a 
layered  architecture  where  data  constructions  in 
one  layer  are  combined  to  produce  flexible 
concepts  using  partially  defined  relationships. 
An  inductive  analysis  of  the  situation  is  then 
possible,  where  a  fusion  of  data  occurs  based  on 
the  accommodation  of  characterizations  of  the 
novelty  of  the  situation  with  characterizations  of 
the  background  knowledge.  This  fusion  must 
have  the  nature  of  learning  (Michalski,  1994). 

Learning  involves  a  reorganization  of  an 
underlying  constructive  field  that  has  certain 
general  properties  (Pribram,  1991).  For 
example,  the  system  states  of  advanced 
computational  ontologies  may  be  evaluated  with 
respect  to  oppositional  scales  (Pospelov,  1994). 
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Figure  4:  Oppositional  scales  provide  metrics  for  the 
interpretation  of  phenomena. 

Oppositional  scales  are  specified,  as 
expected,  within  a  semantic  space  that  reflect 
pairs  of  evaluators,  such  as  good  and  bad,  and 
qualitative  properties  of  the  system  states  (Figure 
4).  The  formation  of  the  compartment  for 
modeling  a  specific  situation  might  deal  directly 
with  the  observation  of  data  and  the  extraction, 
from  this  data,  of  a  minimal  set  of  oppositional 
scales.  The  use  of  these  oppositional  scales  must 
contain  a  logical  means  for  restructuring  the 
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semantic  space  by  using  a  group  transformation 
on  the  space's  basis  set. 

Transformation  rules  are  augmented  by 
quasi  axiomatic  theory  as  an  "open"  algebraic 
transformation  to  allow  the  introduction  of  new 
observables  and  the  formalization  of  second 
order  cybernetic  transformations  of  semantic 
spaces.  The  properties  of  these  semantic  spaces 
are  representable  in  the  form  of  a  database  plus  a 
specific  situational  language  and  logic. 

Various  approaches  have  been  made 
towards  a  formal  description  of  second  order 
cybernetics,  including;  quasi  axiomatic  theory 
(Finn,  1991),  constructive  induction  (Michalski 
&  Ram,  1995),  and  connectionist  architectures 
for  resolving  explanatory  coherence  (Thagard, 
1988).  However,  none  of  these  formalisms  have 
had  the  success  enjoyed  by  simple  biological 
systems.  Complex  natural  systems  are  open 
systems  that  (1)  are  embedded  in  a  larger  space 

(2)  are  composed  through  an  assembly  process, 

(3)  have  behavior  properties  with  internal  and 
external  work  cycles,  (4)  can  be  described  as 
variably  stratified  (Figure  5). 
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image 
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disassembly 
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Figure  5:  The  Target  of  Investigation  is  observed  a  number 
of  times 

Because  of  these  four  properties,  the  Target  of 
Investigation  must  be  observed  a  number  of 
times 

T  =  {  Oj ,  02 ,  . . .  ,  On  } 

•  each  "observation",  O,  of  the  target  has  a 
"bag"  of  properties  P  =  {pi  ,  p2  ,  ...  ,  ps}. 
The  cardinality  of  the  bag  of  properties  is 
finite  (but  open). 

•  each  "observation",  O,  is  composed  from  a 
"bag"  of  basic  elements  A  =  {ai  ,  a2 ,  ...  , 
a^  The  cardinality  of  the  bag  of  elements  is 
finite  (but  open). 

•  logical  hypothesis  (J.S.  Mill)  can  be  asked 

1)  does  an  observation  have  (not  have) 
property  p 

2)  for  each  basic  element,  a; ,  is  a;  cause 
of  (or  not  a  cause  of)  the  property  p 


Of  course,  the  descriptions  of  properties  and  the 
descriptions  of  basic  elements  of  the  Target  of 
Investigation  must  be  possible. 

The  quality  of  any  automated  reasoning 
system  is  a  function  of  its  power  to  reveal  the 
basic  signature  of  a  situation  under  investigation 
(see  Ritz  and  Huber,  1996).  General  systems 
properties,  such  as  oppositional  scales,  are 
required  to  extract  and  reveal  this  signature.  A 
system  that  creates  and  resolves  paradoxes  will 
produce  complementarity  described  by  the 
oppositional  scales  mentioned  in  Pospelov 
(1994).  Multiple  viewpoints  are  then 
accommodated  through  the  emergence  of  a  new 
system  for  understanding  both  ontologies  and 
their  natural  inter-relationships;  and  a  new  player 
in  the  situation,  the  accommodation  itself. 

Block  Diagram  for  Situational  Analysis:  The 

block  diagram  in  Figure  6  is  open  to  architectural 
instanciations  of  computational  systems  that 
duplicate  the  structural  mechanisms  that 
experimental  neuropsychology  suggest  are  used 
by  biological  systems. 


Interpretation 
Space 


Figure  6:  Block  diagram  for  Situational  Analysis 

First,  a  data  source  is  attended  to  by  the 
system  using  measurements  about  properties  and 
features.  These  properties  and  features  are 
represented,  approximated,  as  elements  of  a  set 
of  primary  components.  For  example,  the 
Fourier  transform  produces  a  representation  of 
the  spectral  domain  as  the  sum  of  weighted 
principle  components.  The  Gabor  representation 
may  play  a  similar  role  in  the  human  cortex  (see 
Prueitt,  1997  for  literature  review).  The  theme 
vector  transformation  of  a  document's  content 
produces  a  similar  representation  (Rijsbergen, 
1979;  Prueitt,  1996b).  The  type  space  is  then 
projected  into  a  system  for  visualization.  In  the 
case  of  human  perception,  this  projection  is  a 
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reentrant  bi-projection  between  the  lateral 
geniculate  nucleus  and  the  layered  visual  cortex. 

Very  few  software  systems  have 
visualization  of  theme  (feature)  space  as  well  as 
an  integrated  representation  of  knowledge  in  the 
form  of  a  concept  database.  Concept  databases 
exist  in  a  few  cases  (Abecker,  et  al,  1997),  but 
are  primarily  restricted  to  mechanical  systems  or 
industrial  plants.  Software  packages  such  as 
Spires,  developed  by  Pacific  Northwest  National 
Laboratories,  makes  projections  into  feature 
spaces  based  on  cluster  analysis.  New 
computational  projection  schemes  are  being 
based  on  situational  semantics.  The  object  of 
visualization  might  be  related  not  only  to  a 
themespace  but  also  a  concept  space. 

The  scientific  challenge  is  to  see  both 
computational  visualization  and  human 
perception  within  the  same  paradigm.  This  may 
not  be  so  difficult.  A  large  amount  of 
experimental  research  exists  and  some  of  this  has 
been  integrated  into  explanatory  frameworks. 
The  study  of  these  frameworks  suggests  that  the 
human  brain  achieves  the  formation  of  concepts 
through  a  distributed  disassembly  and  reassembly 
of  representational  features  (Figure.  7).  This 
framework  can  be  seen  to  be  illustrative  of  a 
general  systems  property  regarding  the 
emergence  of  operational  wholes  within 
ecosystems. 


Figure  7:  The  niches  in  ecosystem  share  in  the  common  use 
of  a  finite  class  of  natural  type. 

It  is  also  suggested  that  the  human  in 
vitro  concept  space  is  a  virtual  space  in  the  sense 
that  the  space  does  not  actually  ever  exist  in  total 
in  any  specific  circumstance.  Parts  of  this  virtual 
space  come  into  being  while  other  parts  are 
blocked  by  various  types  of  competitive 
cooperative  network  dynamics.  Of  course,  this 
simple  architecture  disguises  the  complexity  of 


how  the  brain  uses  both  its  neural  architecture 
and  its  chemical  composition. 

A  projection  from  a  complete 
enumeration  of  a  knowledge  engineering  type 
concept  space  can  be  made  onto  a  concept 
subspace.  In  Figure  4  this  is  called  an 
interpretation  space.  This  subspace  may  be 
mirrored  by  activation  of  components  of  a 
situational  model  supporting  automated 
reasoning.  The  mirrors  can  be  maintained  by  a 
neural  network  associative  memory  as 
demonstrated  in  (Dubchak  &  Muchnik,  249). 
The  mirrors  can  be  two  way,  and  as  a 
consequence  a  separate  mapping  back  to  a 
concept  subspace  may  be  made  after  an  inference 
engine  has  changed  the  state  space  of  the 
situational  model.  This  notion  of  projections  and 
mirrors  between  representational  spaces  is  a 
reasonable  first  model  for  the  production  of 
computer  based  situational  models.  However, 
there  are  a  number  of  scientific  issues  that  are  not 
yet  resolved. 

The  structural  form  of  a  computer  based 
concept  space  has  not  been  worked  out.  Simple 
themespaces  have  been  defined,  but  a  new  class 
of  objects  are  needed  in  themespaces.  These 
objects  will  create  topological  distortions  of  the 
otherwise  flat  Euclidean  space,  and  instanciate  a 
theory  of  semantic  operators  defined  in 
coincidence  with  the  inference  engines. 
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ABSTRACT 

This  paper  proposes  a  three-level  hierarchical  mathematical 
model  for  intelligent  and  complex  systems.  Motivation, 
background,  rationale  and  interpretations  are  given.  Illustrated 
in  a  block  diagram  are  different  visions  from  different  schools 
of  thought.  The  concept  of  "soft-modeling"  represents  a 
dramatic  departure  from  conventional  mathematical  modeling 
is  also  presented.  The  characteristics,  approach,  types  of 
mathematics  used,  and  the  interpretations  that  will  eventually 
lead  to  engineering  realization  and  commercialization  are  also 
presented. 

KEYWORDS:  soft  modeling,  intelligent  systems,  hybrid  co  ntrol, 
intelligent  control,  computational  intelligence,  fuzzy  logic,  first  order 
predicate  calculus ,  fizzy  constraint  network,  cognitive  science,  pattern 
recognition,  adaptive  intelligence,  control  of  complex  systems, 
computing  with  words,  soft  optimization,  soft  computing. 

1.  INTRODUCTION 

In  recent  years,  we  have  observed  the  rise  of  the  concept  of 
"soft  computing"  both  in  recognition  of  importance  and  in  the 
increase  in  number  of  technical  papers  published  in  open 
literature  [1,2,3,4,5,6,7].  No  doubt,  soft  computing  as  Lotfi 
A.  Zadeh  conceived  has  been  and  will  continue  to  influence 
the  intelligent  engineering.  Because  its  foundation  is 
biologically  inspired,  soft  computing  can  solve  difficult 
problems,  especially  nonlinear  problems.  At  the  same  time, 
Lotfi  A.  Zadeh  and  many  other  leading  researchers  in  the  field 
agree  that  the  most  fruitful  areas  of  application  will  concern 
"complex  systems"  and  "intelligent systems."  The  definition 
of  "intelligent"  is  very  complicated  and  we  will  not  be  able  to 
draw  any  sensible  conclusions  for  a  very  long  time.  On  a 
more  optimistic  note,  the  desirable  features  constituent  of 
intelligent,  such  as  adaptation,  reasoning,  generalization, 
similarity,  focusing  attention,  pattern  recognition, 
introspection,  combinational  search,  etc.,  are  useful  and 
important  for  making  better  products  by  improving  quality, 
reliability,  and  performance  of  industrial  products  for  human 
beings.  All  these  desirable  features  can  happen  by  lowering 
the  cost  of  production  and  saving  energy,  materially  and 
humanly. 


As  far  as  complexity  is  concerned,  the  impact  will  be  at  least 
equally,  if  not  more  profound.  The  complex  systems,  both 
static  and  dynamic,  are  in  front  of  us  and  surround  us 
everywhere.  Historically,  the  understanding  of  these  systems 
is  limited  because  they  are  usually  not  mathematically 
traceable.  The  types  of  mathematics  referred  to  are  those 
recognized  by  earlier  researchers  or  with  already  established 
credibility  in  the  sense  of  rigorous  scientific  methodologies 
and  traditions. 

The  mathematical  system  theory  begins  with  mathematical 
modeling  of  pragmatic  and  realistic  systems,  physical  or 
manmade.  There  is  very  little  one  can  do  if  a  mathematical 
model  cannot  be  established.  In  other  words,  all  the  analysis 
and  engineering  design  that  follow  it  cannot  be  processed. 
The  mathematical  model,  in  a  sense,  represents  some 
mathematical  constraints  that  must  be  satisfied.  This  is  the 
place  where  the  difficulties  in  analyzing  complex  systems 
arise.  In  retrospect,  all  mathematical  approaches  have 
traditionally  used  in  systems  analysis  belong  to  what  one  may 
call  "  =  "  or  "equal  culture."  In  other  words,  the  constraints 
are  exact.  Unfortunately,  this  constraint  condition  is  not 
always  where  we  are  ready  to  tackle  complex  systems. 
The  purpose  of  this  paper  is  to  propose  a  novel  approach  to 
soft  modeling.  Soft  modeling  is  a  generalization  of  the 
concept  of  mathematical  modeling  in  the  traditional  sense. 
Three  area  of  generalization  worthy  of  mentioning: 

1.  The  generalization  in  mathematical  constraints. 

2.  The  generalization  of  the  type  of  mathematics  being 
used. 

3.  The  generalization  of  the  types  of  physical  or  manmade 
systems  to  be  treated  (i.e.,  the  mixture  of  the  physical, 
engineering,  and  social  economic  systems). 

2.      THREE-LEVEL  SYSTEM 

ARCHITECTURE  FOR  INTELLIGENT 
AND  COMPLEX  SYSTEMS 

To  ease  the  discussion  of  three-level  system  architecture, 
Figure  1  illustrates  this  system  in  terms  of  three  separate 
blocks.  Furthermore,  to  cover  more  ground,  different  schools 
of  thought  that  currently  exist  are  also  listed.  (Please  bear  in 
mind  that  the  investigation  of  systems  of  this  type  can  quickly 
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Top  Level 


Middle  Level 


Bottom  Level 


Soft  Modeling 


Soft  Computing 


Conventional 
Controls 


(i)  Taxonomic  Vision 


(ii)  Cognitive  Science  Vision 


(iii)  Soft  Computing  Vision 


Social-Political-Economy 


Operations  Research 


Machine  Intelligence 


Computer  Science 
Intelligence 
Interdiscipline 


Classical  &  Modern  Control 


Control 
Engineering 


(iv)  Hard  Computing  Vision 


(v)  Scientific  Discipline  Classification 


Figure  1:  Three-level  hierarchical  models  for  different  schools  of  thought. 


be  extremely  complicated  while  the  features  of  intelligence  and 
learning  are  to  be  incorporated.) 

For  the  proper  perspective  on  the  kinds  of  problems  being 
addressed,  A.  C.  Antoulas  of  Rice  University  [8]  says".  .  . 
Scientific  activities  can  be  divided  today  into  two  broad 
categories.  The  first  category  includes  by  in  large  the  natural 
sciences.  .  .  Its  objective  is  to  investigate  fundamental 
properties  of  matter,  and  big  bang,  black  holes  .  .  .  The 
second  category  is  concerned  with  phenomena  and  structures 
which  display  high  complexity.  These  can  be  found  in  nature, 
biological  phenomena  and  the  structure  of  molecules,  like 
DNA,  are  two  examples.  But  for  the  most  part  they  are 
artificial,  generated  by  disciplines  such  as  engineering, 
computer  science,  cybernetics,  ecology,  operations  research, 
economics,  etc." 


In  other  words,  the  complex  systems  described  in  this  paper 
will  deal  exclusively  with  the  examples  in  category  2. 
Although  we  may  be  deeply  involved  with  the  study  of  brain 
phenomena  in  order  to  get  biologically  inspired  spirit  out  of  it, 
we  are  neither  addressing  DNA,  or  crystal  structures. 
Antoulas  [8]  went  on  to  say,  "...  The  main  distinction 
between  these  two  categories  is  their  system  component:  the 
former  has  a  small  system  component  while  the  latter  has  a 
large  system  component. " 

Before  we  enter  the  detailed  description  of  all  three  levels  of 
the  proposed  hierarchical  structure,  all  three  levels  will  be 
briefly  described  in  three  practical  complex  systems:  (1)  a 
large  hotel  complex,  (2)  an  automatic  manufacturing  plant, 
and  (3)  the  command  of  a  battlefield. 
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In  a  nutshell,  a  three-level  hierarchical  model  consists  of  three 
levels:  (1)  the  mission  statement  is  the  top  level  or  constraint 
imposed  by  the  society  of  meaningful  human  endeavor.  The 
soft  modeling  as  proposed  in  this  paper  is  intended  for 
modeling  this  level.  (2)  The  existing  technology  is  the 
bottom  level.  And  (3)  the  middle  level,  on  the  other  hand, 
is  called  cutting  edge  research  and  development. 
These  three  levels  are: 

1.  Top-Level  Subsystem  —  The  goal  of  achieving  some 
global  optimization  for  a  large  complex  hotel  (example  1)  is 
to  optimize  yield  management  with  a  set  of  constraints.  The 
set  of  constraints  include  law,  public  policy,  economic 
condition,  number  of  available  workers,  several  reservation 
priorities,  geographical  location  of  hotel  complex,  seasonal 
variations,  special  events  and  cultural  activities,  etc.  In  order 
to  establish  a  mathematical  model  successful,  a  new  modeling 
methodology  (soft  modeling)  is  needed  to  make  use  of 
different  mathematical  tools. 

Example  2,  similarly,  has  a  set  of  constraints.  These  are  very 
realistic  constraints  that  the  complex  system  must  satisfy: 
purchasing  orders,  availability  of  raw  materials,  defective 
parts  replacement,  number  of  available  workforce, 
international  trade,  inventory,  public  policy,  up-dating  of  legal 
codes,  environmental  regulations,  up-grading  of  the  production 
lines,  etc.  And,  finally,  soft  modeling  is  also  needed  for  the 
battle  field  command  of  Example  3.  Availability  of 
communication  networking,  food  and  material  supplies, 
injured  soldiers,  field  hospital,  transportation,  ammunition  and 
weapons  supplies,  intelligent  information  fusion,  etc.  are  just 
a  subset  of  constraints  which  must  be  handled. 
Yes,  these  are  constraints  that  come  in  different  sizes,  forms, 
units,  measurements,  emphasis,  criteria,  values,  inherited 
culture,  etc.,  etc.  That  is  precisely  the  motivation  for  the 
novel  concept  of  soft  modeling. 

2.  Middle  Level  Subsystem  —  This  subsystem  is  the 
most  important  and  exciting  components  of  the  complex 
system.  The  recognition  of  its  importance  has  been  firmly 
established  by  a  number  of  leading  researchers.  However, 
this  is  not  the  main  theme  of  this  paper,  because  we  are 
discussing  the  issue  of  the  top-level  subsystem.  For  the  sake 
of  completeness  and  an  overall  picture  of  how  to  handle 
complex  systems  issues,  J.  S.  Albus  of  NIST  summarizes 
"enabling  technology"  in  his  paper  [9].  "The  latter  half  of  the 
20th  century  has  seen  a  number  of  fundamental  breakthroughs 
that  have  radically  altered  the  landscape  of  technological 
possibilities: 

a.  The  power  and  speed  (per  unit  cost)  of  electronic 
computing  has  risen  exponentially  by  more  than  a  factor  of  ten 
per  decade  for  over  four  decades.  .  . 

b.  The  store  of  knowledge  about  how  the  biological 
brain  works  has  also  increased  many  fold  over  the  last  half 
century,  .  . 

c.  Solid  theoretical  and  mathematical  foundations 


have  been  established  and  progress  is  rapid  in  the  scientific 
study  of  language  and  speech  understanding,  image 
understanding  and  perception,  knowledge  representation, 
simulation,  reasoning  and  decision  making,  planning  and 
control. 

d.  Engineering  approaches  have  begun  to  emerge 
for  designing,  constructing,  testing,  and  exploiting  intelligent 
systems  for  practical  applications  in  commerce  and  industry." 
In  large  measure,  one  may  say  that  the  soft  modeling  concept 
as  proposed  in  this  paper  was  motivated  by  item  d  quoted 
above.  Tremendous  research  and  development  opportunities 
exist  in  the  course  of  the  realization  of  the  item  d.  It  is  fair 
to  say  that  a  satisfactory  answer  for  the  middle-level 
subsystem's  problems  might  take  a  long  time  to  obtain. 
3.  Bottom-Level  Subsystem  —  There  is  very  little  that 
needs  to  be  said  about  this  third  level  subsystem.  It  would  not 
be  far  off  to  say  this  subsystem  is  already  here.  During  and 
after  World  War  II,  researchers  and  engineers  have  brought 
us  to  the  present  level  of  maturity.  However,  it  is  not  to  be 
said  that  we  do  not  have  new  problems.  In  fact,  there  is  a 
whole  research  community  of  hybrid  systems  conducting 
pioneering  work  in  this  area.  The  5th  Hybrid  Systems 
Workshop,  headed  by  Panos  Antsalkis,  will  take  place  at 
Notre  Dame  [10].  Hybrid  systems  research  is  very  solid  and 
down  to  earth.  As  we  look  to  the  future  of  intelligent  systems 
research,  it  is  worth  considering  an  even  deeper  system 
structure.  Referring  to  Figure  2,  one  can  look  at  hybrid 
system  design  as  a  special  case  of  the  proposed  three-level 
model  in  at  least  two  aspects:  the  automation  is  just  a  special 
case  of  a  much  more  general  mathematics  linguistic  model  and 
without  considering  the  lop-level  subsystem,  then  the  problem 
of  both  intelligent  and  complex  systems  has  not  been  fully 
addressed,  in  our  opinion. 

The  research  activities  for  the  middle  level  subsystem  are  very 
active  at  present  and  will  continue  to  be  very  active  for  a  long 
time  to  come.  Again,  the  main  concern  of  this  paper  is  soft 
modeling  of  intelligent  and  complex  systems. 
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Figure  2:  Hybrid  control  systems. 

In  Figure  1,  five  hierarchical  block  diagrams  represent  five 
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different  visions  depending  on  different  schools  of  thought. 
The  semiotic  view  by  Alex  Meystel  and  James  Albus  is  not 
illustrated  here  because  it  is  difficult  to  present  it  in  this 
structure.  Please  refer  to  Meystel's  original  works  for  proper 
reference  [9]. 

One  thing  is  certain,  with  FCN  as  the  top  subsystem  and 
approximate  reasoning  in  the  middle,  a  much  better 
performance  of  a  complex  system  can  be  realized  as  shown  in 
Figure  3. 


Fuzzy  Constraint 
Network  (FCN) 

I 

Approximate 
Reasoning 

$ 

Control  Systems 


Figure  3:  A  fuzzy -based  realization  of  soft-computing  model. 
3.     SOFT  MODELING 

With  all  the  above  background,  we  now  embark  on  the  task  of 
explaining  the  concept  of  "soft-modeling."  By  no  means  does 
one  pretend  to  borrow  fancy  words  in  order  to  gain  attention. 
Rather,  it  is  an  outgrowth  of  necessity. 

In  Section  II.  1,  we  have  introduced  three  realistic  examples  of 
a  top-level  subsystem.  Traditionally,  for  this  type  of  problem, 
the  modeling  has  been  done  by  operations  researchers.  Their 
approaches  are  not  unique.  One  standard  approach  is  using 
linear  programming.  Another  approach  introduced  by  Lotfi 
A.  Zadel  [5,6]  is  a  fuzzified  version  of  First  Order  Predicate 
Calculus  (FOPC)  method.  The  novelty  aspect  as  described  in 
this  paper  is  the  interface  between  the  top  level  and  the  middle 
level  subsystems.  In  a  later  section,  we  shall  also  show  a 
linear  programming  approach  that  is  a  special  case  of  the 
fuzzy  FOPC  formula. 

To  appreciate  the  need  of  soft-modeling,  we  must  go  back  to 
study  the  evolution  of  convention  control.  It  is  well 
recognized  that  the  first  mathematical  model  to  describe  plant 
behavior  for  control  purposes  is  attributed  to  J.  C.  Maxwell, 
of  Maxwell's  equation  fame,  who,  in  1868,  used 
differentiation  equations  to  explain  instability  problems 
encountered  with  James  Watt's  flyball  governor  [11].  Control 
theory  has  made  significant  strides  since  1868.    During  the 


1930s,  World  War  II,  and  the  1940s,  the  use  of  frequency 
domain  methods  and  Laplace  transforms  were  the  dominant 
methods  for  analyzing  and  designing  control  systems. 
What  does  the  above  statement  tell  us?   That  we  can  solve 
only  problems  that  are  linear  and  time  invariant?  In  fact,  only  ' 
linear  time  invariant  systems  can  be  solved  because  both 
Fourier  and  Laplace  transforms  are  linear  transformations,  j 
Conventional  control  systems  have  always  been  designed  using 
mathematical   models   of  physical   systems.       A   valid  ! 
mathematical  model  is  a  prerequisite  for  control  system 
designs.  Yet,  with  this  very  restricted  and  often  unreasonable  ! 
approximation  (linearization),  it  is  obvious  that  we  have  left  j 
many  problems  unsolved.    In  fact,  in  conventional  control 
approaches,  a  mathematical  model  that  captures  the  dynamic 
behavior  of  interest  must  be  simple  enough  to  be  analyzed 
with  the  available  mathematical  techniques  and  accurate  I 
enough  to  describe  the  important  aspects  of  the  relevant 
dynamic  behavior  [11].  For  nearly  half  a  century,  engineering 
students  were  trained  for  the  most  part  via  so-called  small 
signal  theory.   This  kind  of  approximation  only  delivers  the 
behavior  of  a  plant  in  the  neighborhood  of  an  operating  point.  ' 
This  is  true  for  all  electronic,  vacuum  tube,  transistors,  VLSI  j 
products  as  well  as  electrical,  mechanical,  optical,  hydraulic, 
and  thermo  systems  and  any  combination  thereof.  So  what  is 
the  point?  The  point  is  that  serious  restrictions  are  artificially 
imposed  in  order  to  use  frequency  domain  analysis  and 
Laplace  transforms.  These  restrictions  basically  rule  out  the 
opportunities  to  solve  many  important  and  pressing  complex 
problems.  This  observation  leads  to  one  of  the  explanations 
for  the  need  of  soft-modeling. 

The  second  reason  deals  with  the  nature  of  the  systems  we 
must  learn  to  cope  with.  Contrary  to  mathematical  modeling 
that  is  almost  always  based  on  the  Hamiltonian  principal  and 
Lagrangian  equations  for  continuous  mechanics,  there  exist  j 
few  guidelines  for  the  three  different  types  of  examples  as 
described  in  Section  II.     Depending  on  the  designers' 
background  and  disciplines,  the  outcomes  of  modeling  can  be 
very  different.     The  fact  is  that  no  principle  similar  to  J 
Hamiltonian  and  Lagrangian  is  available  for  use  in  solving  the 
three  examples.  In  other  words,  new  methods  of  solving  these 
kinds  of  examples  must  be  developed.  These  methods  would 
be  different  from  ordinary  differential  or  difference  equations 
and  static  algebraic  equations.    For  example,  binary  logic,  j 
fuzzy  logic,  the  concept  of  relations,  the  operations  of  subsets  i 
all  of  a  sudden  become  very  important.   Predicate  Calculus 
and  First  Order  Logic  are  very  general  methods  for  solving 
problems  when  the  theory  is  applied  to  many  different  physical 
or  human  societal  problems  [12].  Fuzzification  does  not  alter 
much  in  terms  of  modeling,  but  it  makes  a  difference  in  terms 
of  problem  solving  abilities. 

The  dramatic  difference  in  soft  modeling  is  brought  about  by 
the  type  of  mathematics  used.  For  example,  we  have  seen 
what  linear  programming  methodology  can  do  for  operations 
researchers.      This  methodology  includes  the  so  called 


"inequality"  or  symbolic  "<"  that  is  relation  in  a 
mathematical  concept.  To  fully  understand  soft  modeling,  one 
can  think  of  the  solution  for  a  state  vector  that  may  not  be 
unique  and  one  must  talk  about  "solution  space"  instead  of  just 
a  unique  "solution. "  In  optimal  control  theory,  the  solution  is 
nearly  always  unique.  In  the  case  of  soft  modeling,  this  will 
not  hold  true.  This  very  fact  may  be  justification  for  the 
nomenclature  "soft  computing"  as  coined  by  Lotfi  A.  Zadeh 
[3].  The  work  "soft"  means  the  solutions  will  be  "solution 
space"  as  a  direct  consequence  of  the  soft  modeling. 
Several  important  observations  concerning  soft  modeling  are 
in  order  (refer  to  Figures  4  and  5): 


HI  Rnnc  Solution  Spats  i„,  IntTniic  Solution  Space 

iGniunJmL*  Soft-Modeling  I 


Figure  4:  Solution  space. 


Shrinking  of  Solution  Space 
Figure  5:  Learning  of  soft-modeling. 

Observation  No.  1  —  There  are  two  types  of  solution 
space — finite  and  infinity. 

Observation  No.  2  —  The  quality  of  soft-modeling  depends 
on  the  size  of  solution  space,  the  smaller  the  better. 
Observation  No.  3  —  An  infinitive  solution  space  can  be 
change  to  a  finite  solution  space.  In  principle,  this  is  always 
possible  for  manmade  social-economic  systems.  The  real 
difference  is  at  what  cost. 

Observation  No.  4  —  It  is  impossible  for  a  solution  space  to 
be  reduced  to  a  single  point  in  state  space.  Should  it  happen 
that  the  solution  space  is  reduced  to  a  single  point  under  very 
special  conditions  then  the  system  involved  can  no  longer  be 


call  soft  modeling. 

Observation  No.  5  —  The  learning  problem  as  defined  by 
soft  modeling  is  eminently  related  to  the  shrinking  of  solution 
space. 

Observation  No.  6  -•  The  optimization  process  for  soft 
modelled  systems  begins  where  the  final  solution  space  is 
finished.  (Actually  one  says  the  soft  system  is  grounded  or  it 
is  established.)  The  optimal  solution  then  can  be  established 
by  examining  the  optimal  solution  over  the  solution  space. 

4.      FUZZY  CONSTRAINT  NETWORK 

Fuzzy  Constraint  Network  (FCN)  is  the  fuzzy  version  of 
FOPC  and  has  been  recognized  as  a  key  methodology  in  the 
top-  level  subsystem  in  our  hierarchical  structure.  FCN  was 
first  presented  to  the  intelligent  system  community  by  James 
Bowen  and  Robert  Lai  in  Lai's1  dissertation  [13,14].  Didier 
Dubois  and  Henry  Prade  have  also  been  working  on  a 
somewhat  different  approach  for  many  years  [15]. 
Subsequently,  FCN  has  been  investigated  further  by  Paul  P. 
Wang,  Dennis  Bahler,  C.  Y.  Tyan  [16].  The  application  of 
FCN  has  always  been  fairly  broad  and  subsequently  there  is 
no  lack  of  papers  in  this  field.  Potentially,  FCN  can  play  a 
very  important  role  in  soft  modeling. 

As  discussed  in  Section  I,  the  concept  of  mathematical 
modeling  must  be  critically  revisited  in  order  to  meet  the 
challenges  of  solving  intelligence  and  complex  problems. 
Some  thoughts  on  this  very  issue  are  in  order.  Since  the 
systems  are  complex,  to  find  suitable  mathematics  to  produce 
quantitative  results  may  not  be  readily  available.  As  Lotfi  A. 
Zadeh  has  demonstrated,  fuzzy  logic  can  solve  complex  issues 
that  are  otherwise  impossible  [2].  One  can  view  the 
mathematical  modeling  as  an  attempt  to  search  for  some  kind 
of  mathematical  language  to  provide  useful  information  and 
consequently  some  numerical  value  available  and  useful.  On 
the  other  hand,  it  is  not  one's  intention  to  make  this  over 
complicated.  Cover  and  Thomas  have  discovered  Einstein's 
saying  in  a  Chinese  fortune  cookie:  "Everything  should  be 
made  as  simple  as  possible,  but  no  simpler."  [17]  Upon 
further  examination  of  the  three  examples,  one  discovers  the 
fuzzy  FOPC  has  indeed  exhibited  remarkable  express  power 
in  modeling  the  reality  of  a  complex  system.  This  kind  of 
model  can  also  be  made  quite  robust  and  should  be 
constructed  in  an  adaptive  manner.  This  is  precisely  what  the 
FCN  methodology  is  good  form. 

In  conventional  control  theory,  the  design  of  a  physical  system 
often  consists  of  four  steps  [18]: 

1 .  Modeling 

2.  Setting  up  mathematical  descriptions 
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3.  Analyses 

4.  Design 

The  physical  system,  theoretically  speaking,  is  an  object 
existing  in  the  real  world,  its  precise  characteristics  are  often 
unknown  to  us.  The  difficulties  will  increase  many  fold  when 
the  systems  are  complex.  Our  final  objective,  of  course,  is  to 
accomplish  items  1  through  3  successfully  and  eventually 
develop  a  good  design.  With  this  perspective  in  mind,  the 
research  needed  for  this  proposed  system  hierarchical  structure 
is  only  the  beginning.  Indeed,  we  have  a  long  way  to  go. 

5.  CONCLUDING  REMARKS 

As  early  at  1958,  researchers  [19]  recognized  a  logic  circuit 
can  be  incorporated  in  a  conventional  control  system  for  one 
of  two  general  purposes:  for  adjusting  the  control  system 
parameters  or  for  overriding  the  conventional  control  system. 
It  took  many  years  for  hybrid  control  systems  to  become  a 
serious  research  topic  [10].  The  thesis  of  this  paper  actually 
is  quite  straightforward.  Since  fuzzy  logic  is  much  more 
general  than  bivalent  logic  and  its  ability  to  handle  imprecision 
and  incomplete  situations  have  been  demonstrated  many  times 
already,  the  proposed  three-level  architecture  ought  to  be  the 
general  problem  solver  by  adopting  FCN  [16]. 
Three-level  architecture  is  analogous  to  that  of  a  three-layer 
neural  network  as  a  generic  term.  Theoretically  speaking, 
because  the  architecture  of  a  large  complex  system  can  assume 
many  variations  in  form  and  structure,  many  possible 
combinations  of  the  basic  building  blocks  are  possible. 
Depending  on  the  nature  of  the  real-world  problems,  the 
number  of  possible  combinations  and  variations  can  indeed  be 
huge.  What  we  have  presented  in  this  paper  is  just  the  very 
basic  structure  of  a  generic  subsystem. 
System  theory  is  a  discipline  which  aims  at  providing  a 
common  abstract  basis  and  unified  conceptual  framework  for 
studying  the  behavior  of  various  types  of  forms  of  systems 
[19].  The  foundation  issues  covering  fundamental  concepts 
such  as  system,  state,  stability,  controllability,  observability, 
determinateness,  equivalence,  etc.  are  not  going  to  be  easy  for 
the  intelligent  systems  of  this  type.  It  is  highly  possible  that 
new  concepts  will  be  developed  as  we  move  along.  On  the 
other  hand,  we  are  fortunate  to  have  a  collection  of  various 
methods,  techniques,  and  algorithms  designated  as  soft 
computing  for  studying  the  behavior  and  design  of  complex 
systems. 

The  study  of  intelligence,  learning,  and  training  has  not  been 
mentioned  in  this  paper  because  our  focus  has  been  on  the 
concept  of  soft  modeling  for  complex  systems.  Many 
problems  remain  to  be  investigated. 
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Abstract 

Computational  Embodiment  is  the  computer  implemen- 
tation of  principles  that  we  believe  will  lead  to  more  au- 
tonomous and  self- generating  behaviors  that  will  allow 
software  systems  to  exist  in  and  interact  with  complex 
environments.  We  restrict  our  attention  here  to  symbolic 
environments  (MUDs),  as  an  initial  step  towards  under- 
standing and  constructing  "interaction  spaces"  in  which 
humans  and  computer  programs  can  interact  on  an  equal 
footing. 

Our  approach  to  constructing  autonomous  software  sys- 
tems is  based  on  theoretical  work  on  the  structures  that 
underly  language  and  movement  in  biological  systems  and 
on  the  structure  of  constructed  complex  systems  mediated 
or  integrated  by  software. 

This  paper  describes  some  of  the  important  lessons  to  be 
learned  from  biological  systems,  and  points  towards  some 
of  the  principles  of  construction  of  autonomous  systems. 


1  Introduction 

In  this  paper  and  its  companion  [17],  we  describe  an  ap- 
proach to  constructing  autonomous  agents  that  is  based 
on  theoretical  work  on  the  organization  of  structures  that 
underly  language  and  movement  and  on  the  structure  of 
constructed  complex  systems  mediated  or  integrated  by 
software.  We  are  not  only  interested  in  control  of  such  sys- 
tems "to  keep  the  system  within  a  desired  space"  (which  to 
us  includes  planned  or  computed  reactions  to  unplanned 
events  or  behaviors),  but  also  in  instrumentation  to  find 
out  what  is  going  on  inside  and  outside  the  system,  in 
on-line  management  of  system  resources  for  efficiency,  in 
negotiation  among  system  resources,  and  in  system  archi- 


tectures that  will  help  us  design,  build,  use,  maintain,  and 
analyze  such  systems. 

We  are  constructing  autonomous  software  agents  in  sym- 
bolic environments  called  Multi-User  Domains  (MUDs) 
[20]  [21]  [9],  using  a  style  of  "computational  embodiment", 
which  is  part  of  our  research  program  on  interaction  spaces 
[13]. 

In  our  view,  a  system  is  "autonomous"  if  it  can  be  said  to 
have  "purposeful  behavior",  e.g.,  act  independently  based 
on  internally  generated  intentions  [5].  There  are  really 
only  two  classes  of  (difficult)  requirements  for  effective 
autonomy:  robustness  and  timeliness.  Robustness  means 
graceful  degradation  in  increasingly  hostile  environments, 
as  well  as  an  ability  to  exploit  unexpected  aspects  in  the 
environment  to  one's  own  advantage,  which  to  us  implies 
a  requirement  for  adaptability.  Timeliness  means  that  sit- 
uations are  recognized  "well  enough"  and  "soon  enough", 
and  that  "good  enough"  actions  are  taken  "soon  enough". 
There  is  not  necessarily  any  optimization  here. 

In  this  paper,  we  describe  some  fundamental  theory 
of  computational  embodiment,  primarily  theory  derived 
from  biological  systems  (Section  2).  In  the  companion  pa- 
per [17],  we  show  how  our  notion  of  wrappings  as  dynamic 
infrastructure  supports  the  theory,  and  describe  how  the 
architecture  we  use  is  being  implemented. 

An  earlier  version  of  this  paper  appeared  in  [10],  and  a 
summary  in  [4]. 

2    Biological  Characteristics 

It  is  instructive  to  compare  computational  systems  with 
biological  systems,  since  those  are  the  only  ones  we  know 
that  have  any  interesting  complexity  and  variety  of  be- 
havior [5].   From  the  very  smallest  and  simplest  organ- 
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isms,  biological  systems  have  generative  processes  that 
provide  varying  numbers  of  appropriately  constrained  ac- 
tions that  retain  sufficient  flexibility  for  robust  behavior. 
These  "controlled  sources  of  variation"  require  both  vari- 
ation (spaces  within  which  choices  are  constrained)  and 
control  (of  the  choices),  and  remain  one  of  the  most  diffi- 
cult areas  in  theoretical  biology.  We  note  also  that  the  use 
of  language  for  communication  among  biological  entities 
can  be  viewed  as  an  extension  of  movement  processes  [3], 
both  being  essentially  defined  by  layers  of  symbol  systems. 

In  biological  systems,  generative  processes  make  use  of 
many  different  types  of  mechanisms  at  different  levels  of 
the  system;  everything  from  mechanisms  that  take  advan- 
tage of  side-effects  and  even  inevitable  errors  (e.g.,  the 
inability  to  replicate  DNA  perfectly  becomes  eventually 
a  critical  source  of  genetic  variation,  leading  to  differ- 
ent phenotypes  that  can  be  acted  upon  by  natural  selec- 
tion) to  mechanisms  that  allow  partial  or  graded  behav- 
ioral responses  (e.g.,  diffusion  of  neurotransmitters  across 
synapses,  different  conduction  rates  in  bundles  of  neurons, 
and  modulation  of  chemical  channels  across  cell  mem- 
branes) to  mechanisms  that  allow  rich  cognitive  control 
over  behavioral  choices  (and  for  us,  at  least,  conscious 
manipulation  of  symbol  systems).  Generative  processes  in 
biological  systems  have  dynamic  behavior  that  can  be  used 
both  for  organizing  behavior  and  creating  information. 

In  contrast,  computational  systems  are  relatively  trivial, 
since  the  absence  of  generative  processes  means  that  the 
system  developers  must  provide  these  controlled  sources 
of  variation  explicitly,  generally  using  knowledge  bases 
and  computational  resources  [18].  This  need  to  build  all 
the  parts  and  all  the  selection  mechanisms  severely  limits 
our  ability  to  construct  large  systems  with  many  types  of 
adaptive  and  coordinative  behavior.  One  of  our  philosoph- 
ical principles  for  system  architecture  is  the  analogue  of 
the  "controlled  sources  of  variation"  mentioned  above:  ev- 
ery flexibility  we  want  in  a  constructed  system  must  have  a 
corresponding  coordinative  mechanism  that  manages  the 
flexibility.  We  use  the  notion  of  layers  of  symbol  systems 
to  interpret  action  and  behavior  as  extensions  of  computa- 
tion [3],  gaining  some  conceptual  simplicity  thereby,  and 
though  most  of  the  hard  questions  still  have  no  good  an- 
swers, we  have  made  some  progress  in  defining  essential 
features  of  complex  computing  systems  [10]  [12]  [13]  [14] 
[2]  [15]  [16]. 

Our  purpose  is  to  improve  the  capabilities  of  computer- 
based  agents,  so  that  they  can  work  with  humans  to  per- 
form increasingly  difficult  cognitive  tasks,  such  as  finding 
relevant  information,  filtering  and  refining  the  presenta- 
tion of  information  for  our  purposes,  and  organizing  and 
executing  the  sequence  of  command  steps  for  such  ma- 
chines as  factory  product  lines  or  autonomous  land  vehi- 
cles. We  believe  that  the  growing  field  of  agents  research 


could  make  more  use  of  the  hard-won  lessons  from  biologi- 
cal research  and  robotics  about  what  it  means  to  carry  out 
intelligent  functioning  within  ANY  world  (be  it  abstract 
or  physical). 

These  four  lessons  include: 

1.  Developing  "ecological  niches"  or  portable  contexts 
for  computational  agents; 

2.  Creating  artificial  "embodiment"  for  abstract  agents; 

3.  Creating  entities  with  "social"  behaviors;  and 

4.  Developing  capabilities  for  growth  and  adaptivity  of 
behaviors. 

Before  we  expand  on  each  of  these  points  below,  we  want 
to  emphasize  here  that  each  of  these  directions  is  being 
considered  in  current  agent  research  [19],  but  often  only 
indirectly  or  implicitly.  We  suggest  that  by  placing  the 
agents  within  an  explicit  context  (with  its  own  processes 
for  being  examined  and  modified),  by  giving  agents  a  def- 
inite "shape"  and  body  within  this  context,  and  by  giving 
agents  a  means  of  socially  responding  to  other  agents  in  ex- 
plicit and  observable  ways,  we  can  strengthen  the  agents' 
performances  by  creating  new  and  stronger  information 
with  which  to  evaluate  our  ideas  of  intelligent  functional- 
ity by  agents.  We  believe  that  ecological  niches,  embodi- 
ment, and  social  intelligence  are  all  good  starts  to  build- 
ing agents  with  a  much  richer  repertoire  of  mechanisms 
for  adaptation  and  growth. 

The  grounding  of  agent  behaviors  within  an  observable 
context  lets  us  build  active,  generative  processes  that  can 
monitor  their  own  behavior,  change  it,  catalog  it,  and  learn 
from  it.  But  in  order  to  capitalize  on  the  creation  of  such 
processes,  the  agent  needs  (1)  architectures  that  allow  it 
to  pull  in  new  types  of  processing  resources,  and  (2)  self- 
reflective  capabilities.  In  a  companion  paper  [17],  we  de- 
scribe our  wrapping  approach  to  dynamic  infrastructure, 
which  provides  the  flexibility  and  infrastructure  that  sup- 
ports these  two  goals. 

2.1    Ecological  Niches  for  Agents 

In  nature,  the  capabilities  and  characteristics  of  any  given 
animal  are  strongly  related  to  the  environment  within 
which  it  lives.  The  context  within  which  the  animal  must 
perform  includes  the  physical  world,  the  other  species  that 
exist  within  that  area,  the  history  of  its  own  species  within 
that  environment,  and  its  immediate  social  interactions 
with  others  of  its  own  kind.  Biologists  continue  to  strug- 
gle with  understanding  the  kinds  of  powerful  principles 
that  lead  to  the  extraordinary  amount  of  specialization 
and  variety  that  has  resulted. 
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We  need  to  take  a  lesson  from  biology  here:  there  are  no 
such  things  as  "general  purpose  fish".  They  are  all  rec- 
ognizably fish,  and  hence  there  are  many  general  aspects 
that  we  can  use  in  creating  architectures  and  standard  pro- 
cesses for  "fishness".  However,  fish  are  specialized  by  liv- 
ing in  saltwater  or  fresh;  living  near  the  top  sunlit  waters 
or  being  bottom  dwellers;  by  surviving  near  fierce  preda- 
tors or  by  having  few  competitors;  by  living  in  schools 
or  developing  other  protective  mechanisms;  and  so  forth. 
Eventually,  we  want  to  enumerate  the  corresponding  "eco- 
logically" important  distinctions  for  "agents",  e.g.,  do  they 
live  on  the  Internet  or  in  a  digital  library;  do  they  have  to 
fight  for  our  attention  or  do  they  have  dedicated  lines  or 
other  access  to  us;  do  they  work  as  part  of  a  temporary 
alliance  of  known  (or  harder,  unknown)  resources,  or  are 
they  a  simple  logic  or  filter  component  in  a  larger  process; 
and  so  forth. 

The  research  advantages  of  building  explicit  contexts  for 
our  agents  are  clear.  Instead  of  trying  to  align  our 
"agents"  and  their  behaviors  to  a  hidden  or  implicit  con- 
text, we  can  start  to  represent  and  manipulate  "niches"  ex- 
plicitly, study  the  mechanisms  by  which  our  agents  relate 
to  that  kind  of  context,  study  the  behaviors  of  our  agents 
within  the  explicit  contexts,  and  begin  to  build  principles 
of  how  we  distribute  information  and  processing  capabili- 
ties between  the  agent  and  its  operating  environment. 

Multi-User  Domains  (MUDs)  are  an  interesting  new  kind 
of  groupware  that  incorporates  people  into  the  program. 
MUDs  have  become  enormously  popular  as  games  and  as 
educational  support  tools  over  the  last  few  years  because 
they  get  the  human  interactions  right  in  some  fundamen- 
tal sense,  and  because  they  engage  our  sense  of  "place". 
MUD  clients  and  servers  are  easy  to  obtain  and  run  (most 
servers  and  clients  are  free),  but  they  usually  only  provide 
text  worlds;  there  is  little  interaction  with  existing  tools 
that  are  outside  the  MUD;  though  some  have  construc- 
tion languages  that  allow  complex  programming,  it  is  the 
usual  kind  of  programming;  and  it  is  not  very  easy  to  ac- 
cess large  volumes  of  information.  We  have  been  exploring 
the  use  of  MUDs  as  an  ecological  niche  for  agents  [13]. 


2.2  Embodiment 

The  next  hard-won  lesson  from  biology  and  robotics  is  that 
intelligent  functionality  needs  a  body,  e.g.,  it  needs  a  shape 
that  defines  and  limits  its  capabilities  and  scope.  In  ani- 
mals, bodies  reflect  both  the  short-term  and  long-term  his- 
tory of  a  species'  adaptation  to  its  ecological  niche.  In  ani- 
mals, bodies  reflect  the  physical  constraints  of  the  environ- 
ments, the  historical  and  immediate  interactions,  choices, 
and  methods  of  dealing  with  the  environment  (which  in- 
cludes others).  The  embodiment  is  the  grounding  of  the 


agent  and  its  means  of  relating  to  its  environment. 

Three  other  critical  issues  emerge  from  considering  bodies. 

The  first  is  that  an  embodied  agent  reminds  us  that  an 
agent  always  exists  within  its  world  and  that  it  is  always 
doing  something  within  that  world.  Even  "idling"  is  a  be- 
havior, the  result  of  which  can  be  quite  unpleasant  for  prey 
in  the  real  world,  or  simply  annoying  to  a  user  waiting  for 
performance.  Therefore,  we  must  account  for  the  causal 
connection  to  and  from  the  environment,  that  is,  the  ef- 
fects of  events  in  the  environment  on  the  agent,  and  the 
effects  of  agent  activities  on  external  objects.  The  agent 
can  learn  of  these  effects  or  cause  others  only  through 
the  specific  interface  between  the  agent  and  the  environ- 
ment. That  interface  must  make  actions  available,  so  that 
the  use  of  tools  and  other  activities  has  the  proper  effect. 
There  must  be  appropriate  sensors  to  collect  information 
for  the  agent  from  the  environment,  sufficient  effectors, 
so  the  agent  can  change  its  relationship  with  the  environ- 
ment, including  motion,  so  it  can  announce  information  j 
to  environment,  make  changes  in  the  environment  via  the 
use  of  tools,  and  there  should  be  probes,  which  are  tasked 
sensory  activities  governed  by  a  focus  of  perception. 

Our  computational  version  of  the  language  and  movement 
theory  [3]  is  that  there  are  layers  of  symbol  systems  that 
separately  normalize  external  signals  into  interesting  in- 
formation spaces,  so  that  the  useful  processing  can  take 
place  in  those  spaces  (this  separation  is  the  analogue  of 
the  behavior  of  different  senses). 

The  second  is  that  the  body  is  a  barrier;  we  cannot  reach  j 
into  the  embodied  system  and  turn  switches  when  we  want 
something  to  happen  [2].  Commands  are  only  suggestions,  I 
since  we  cannot  force  any  particular  kind  of  behavior  on 
the  system.  We  can  provide  compelling  reasons,  and  try  to 
arrange  that  we  have  programmed  the  system  to  respond 
appropriately,  but  we  can't  enforce  it.  This  is  one  of  the  j 
main  differences  we  see  between  computer  programs  that 
we  want  to  call  agents  and  ordinary  computer  programs. 

The  third  is  that  embodiment  is  an  explicit  representation 
of  the  "Integration  Concept"  for  the  "whole"  animal.  This 
integration  concept  provides  a  unity  to  the  whole  system,  j 
It  is  what  turns  a  bag  of  parts  into  a  coherently  operating 
individual  entity.  A  body  has  many  parts,  but  works  as  j 
one  entity,  integrated  across  disparate  resources  (unless  it 
is  sick  or  dysfunctional). 

In  our  opinion,  some  of  the  most  important  aspects  of 
autonomous  systems  revolve  around  this  notion  of  "em- 
bodiment" ,  that  is,  the  connection  of  the  computation  to 
an  entity  in  the  physical  world.  Embodiment  in  this  sense 
has  several  essential  properties: 

•  a  sense  of  environment:  the  environment  has  prop- 
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erties  and  contains  tools,  objects,  and  other  agents, 
and  makes  certain  actions  available 

•  a  sense  of  presence:  there  is  a  causal  connection 
to  and  from  the  environment,  so  that  the  use  of  tools 
and  other  activities  have  a  direct  impact  on  it 

•  a  sense  of  time:  it  is  important  for  the  agent  to 
remember  the  history  of  what  has  gone  before,  to  rec- 
ognize certain  kinds  of  event  patterns,  and  to  make 
predictions 

•  a  sense  of  place:  what  happens  and  what  exists  take 
place  in  some  locality,  which  has  a  notion  of  geogra- 
phy, connectivity  or  connection,  distance,  and  space 

•  a  sense  of  will:  autonomous  means  self-governing, 
so  an  autonomous  system  must  generate  its  own  goals, 
and  exhibit  a  sense  of  purpose  and  intention 

•  a  sense  of  self:  an  autonomous  system  needs  to  have 
a  notion  of  its  own  resources,  abilities,  and  internal 
state 

•  a  sense  of  perspective:  in  order  to  make  effective 
choices  and  recognize  opportunities  and  threats,  an 
autonomous  system  needs  a  viewpoint  of  itself  in  its 
environment 

These  qualities  are  analogous  to  the  well-known  "where, 
when,  what,  who"  of  descriptive  reporting,  the  "why"  of 
interpretation,  and  the  "me,  here"  of  local  viewpoints.  We 
have  organized  them  in  this  way  to  show  the  analogies. 

Current  research  work  on  MUDs  includes  ways  of  em- 
bodying intelligent  agents  and  studying  them  in  a  con- 
text where  humans  and  agents  can  interact.  A  number  of 
researchers  under  the  DARPA  CAETI  (Computer-Aided 
Education  and  Training  Initiative)  program  [1]  have  been 
embodying  evaluation  agents,  intelligent  tutors,  librarians, 
receptionists,  and  guide  agents  as  part  of  research  in  the 
use  of  MUDs  and  agents  in  education. 

2.3    Social  Intelligence 

Part  of  the  context  for  any  animal  is  the  presence  of  other 
individuals,  both  competing  species  and  members  of  its 
own  species.  If  we  imagine  the  jungle  of  the  Internet,  we 
can  well  envision  the  situation  in  which  one's  own  agents 
will  have  to  deal  not  only  with  well-behaved  conspecific 
agents,  but  all  sorts  of  tools  with  different  degrees  of  in- 
telligence and  behavior.  They  may  even  have  to  deal  with 
malicious  software  or  rogue  agents. 

Furthermore,  even  if  we  have  agents  that  deal  only  with 
carefully  selected  others,  we  need  to  develop  much  better 


ideas  of  what  it  means  to  communicate  and  behave  coop- 
eratively among  agents.  Many  social  scientists  and  now 
a  few  computer  scientists  [6]  [8]  have  been  pointing  out 
for  some  time  that  cooperative  behaviors  -  shared  goals, 
work,  understandings,  communications  -  do  not  occur  in- 
side a  participant  alone.  In  addition  to  the  interpretations 
of  the  acts  or  symbols  within  each  participant  alone,  there 
is  also  an  act  of  negotiation  and  agreement  among  partic- 
ipants as  to  the  meaning  of  acts  (including  speech  acts). 
These  mutually-defined  meanings  are  necessary  in  order 
for  the  group  to  interact  even  on  the  simplest  levels,  e.g., 
to  determine  what  constitutes  the  desired  results  of  even 
individually  determined  actions,  or  what  constitutes  that 
an  action  has  been  "done"  in  order  for  another  agent  to 
proceed.  Clearly  these  "meanings"  can  be  built  in  (and 
in  fact  usually  are  in  most  agent  interactions,  in  the  form 
of  stopping  rules  or  predefined  preconditions  for  further 
activity),  but  eventually  we  want  to  have  agents  that  are 
"smart"  enough  to  act  more  autonomously  and  adaptively. 

A  key  aspect  of  being  able  to  negotiate  among  agents  turns 
out  to  be  the  ability  to  share  a  common  context  and  to  ob- 
serve the  other  behaviors  within  that  context.  The  notion 
of  niche  allows  us  to  create  places  within  which  an  agent's 
behaviors  are  visible  and  interpretable  both  by  itself  and 
by  others.  In  an  immediate  practical  sense,  this  increases 
the  "bandwidth"  of  information  that  others  can  process 
about  the  agent;  instead  of  just  relying  on  messages  and 
self- report,  other  agents  can  observe  and  infer  usable  infor- 
mation from  the  agent's  behavior.  Eventually,  we  want  to 
use  such  behavioristic  information  to  form  the  basis  for  a 
shareable  semantics  -  a  real  semantics,  grounded  in  shared 
experience  within  this  niche  -  for  human-agent  communi- 
cation. 

2.4    Growth,  Adaptation,   and  Develop- 
ment 

Lastly,  Adaptation,  like  all  other  sets  of  mechanisms,  will 
be  itself  specialized  to  different  worlds.  We  consider  adap- 
tation to  have  two  somewhat  different  aspects:  one  is  the 
construction  and  selection  of  variation  spaces  within  which 
the  adaptations  may  occur,  and  the  other  is  the  selection 
of  actions  within  those  spaces.  Biological  systems  seem 
to  be  able  to  generate  both  of  these  kinds  of  process  as 
needed,  at  least  to  a  limited  extent.  In  order  that  the 
system  can  respond  to  a  wide  dynamic  range  of  possible 
environmental  conditions,  a  very  broad  range  of  potential 
behaviors  must  be  available  to  the  system.  We  call  these 
collectively  the  "variation  space"  for  the  problem  at  hand, 
and  regard  the  problem  of  constructing  the  appropriate 
variation  spaces  as  the  fundamental  part  of  adaptation. 
Making  choices  once  the  choices  are  presented  is  not  as 
hard. 
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There  must  be  processes  that  construct  variation  spaces 
within  which  adaptation  can  take  place,  and  decision  pro- 
cesses that  can  make  adaptive  choices  within  those  spaces. 
We  want  algorithms  for  generating  appropriate  variation 
spaces  and  decision  processes.  We  want  change  processes 
to  make  evolutionary  behavior  automatic.  We  want  some 
changes  to  be  directed  by  history,  some  by  environment, 
and  some  by  intention. 

We  claim  that  the  purpose  of  adaptation  in  systems  is  to 
allow  them  to  be  effective  in  uncertain  dynamic  environ- 
ments, that  this  purpose  argues  for  complex  heterogeneous 
systems,  and  that  regardless  of  the  algorithms  used  for 
adaptation,  the  rest  of  the  organization  must  have  certain 
properties  for  adaptation  to  work.  Adaptation  requires 
flexibility;  every  flexibility  must  have  a  corresponding  co- 
ordination mechanism,  and  every  such  pairing  can  use  dif- 
ferent methods  for  adaptation.  In  fact,  every  such  pairing 
must  use  different  methods  for  adaptation,  at  least  super- 
ficially, since  their  context  and  scope  are  different.  This 
argument  leads  directly  to  the  requirement  for  multiplicity 
in  ALL  aspects  of  the  system,  including  the  basic  adapta- 
tion algorithms. 

Adaptation  of  external  interaction  behavior  (input  collec- 
tion, motion,  and  communication),  using  internal  repre- 
sentations of  that  behavior  and  its  effects,  is  important 
for  real-world  embedding,  because  it  lets  the  system  ad- 
just the  dynamic  range  of  its  sensors  to  the  current  en- 
vironmental characteristics.  Adaptation  of  internal  repre- 
sentations (notations,  interpreters,  and  other  processes)  is 
the  same  process,  applied  to  the  internal  processing  in  the 
system. 

2.5    The  Use  of  Symbols 

Biological  entities  use  symbols  (mainly  chemical)  and  their 
interpretations  (mainly  through  chemical  processes)  to 
represent  and  transmit  knowledge  of  their  interior  and  ex- 
terior context  [3].  Over  time,  they  somehow  also  make 
interpretations  and  distinctions  at  increasing  depths  of  de- 
tail, and  use  previous  distinctions  to  define  contexts  that 
allow  them  to  focus  more  and  more  precisely.  Eventually, 
new  symbols  arise  in  context  via  their  use  in  interactions. 
Their  meaning  is  usually  inferred,  not  explicitly  transmit- 
ted. The  general  success  of  this  process  must  mean  that 
the  assumed  shared  experience  that  underlies  interaction 
is  sufficient.  With  computers  it  will  be  different;  there 
is  no  reason  to  believe  that  they  will  ever  have  the  same 
shared  experience,  so  we  must  examine  very  carefully  how 
we  can  provide  the  appropriate  replacement  for  the  con- 
text and  knowledge  structures  supplied  by  that  experience 
[15]. 

In  particular,  there  is  a  fundamental  difference  in  the  way 


symbols  are  grounded.  Biological  systems  have  symbols 
grounded  in  the  "real  world" .  Computational  systems  gen- 
erally have  their  symbols  grounded  in  a  small  discrete  sys- 
tem. We  want  to  construct  complex  heterogeneous  com- 
puting systems  that  have  increasingly  interesting  and  ro- 
bust complexities  of  behavior,  and  we  believe  that  this  goal 
requires  an  advance  in  computational  practice:  changing 
the  ways  in  which  symbols  are  represented  and  used  in 
computers.  We  must  therefore  define  computationally  vi- 
able theories  of  symbols  and  representation,  that  is,  ways 
for  programs  to  work  more  effectively  with  the  meanings 
that  underlie  the  labels  used  in  them,  given  the  relatively 
primitive  operations  on  those  labels  currently  available  to 
computing  systems.  We  believe  that  these  theories  cannot 
be  restricted  to  structures  built  from  indivisible  units,  and 
that  we  therefore  need  some  new  developments  in  Math- 
ematics to  treat  this  new  kind  of  semiotic  foundation  for 
systems  [11]  [16]. 

3  Prospects 

Autonomous  systems  must  be  complex  systems,  with  an 
ever-increasing  repertoire  of  possible  behaviors  and  pro- 
cesses for  selecting  them,  fallback  choices  to  account  for 
incorrect  situation  estimation,  and  quick  partial  solutions 
to  reduce  decision  time.  All  activity  is  situated,  strongly 
dependent  on  context,  and  there  need  to  be  different  de- 
cision processes  in  different  situations.  This  flexibility  re- 
quires an  architecture  in  which  many  parts  of  the  system 
are  infrastructure,  organizing  other  parts  of  the  system  to 
identify  and  address  problems,  and  monitoring  their  be- 
havior [7].  The  ability  to  reflect  on  one's  own  behavior 
(and  modify  it)  is  one  of  the  main  keys  to  effective  auton- 
omy. 

We  believe  that  with  proper  attention  to  niches,  embodi- 
ment, social  behaviors,  and  architectures  supporting  adap- 
tive and  reflective  behaviors,  we  can  build  agents  that  are 
more  useful  partners  in  an  increasingly  complex  informa- 
tion environment. 
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Abstract 

Intelligent  behavior  is  characterized  by 
flexible  and  creative  pursuit  of 
endogenously  defined  goals.  It  has 
emerged  in  humans  through  the  stages  of 
evolution  that  are  manifested  in  the  brains 
and  behaviors  of  the  vertebrate  series. 
Intentionality  is  a  key  concept  by  which 
to  link  brain  dynamics  to  goal-directed 
behavior.  The  archetypal  form  of 
intentional  behavior  is  an  act  of 
observation  into  time  and  space,  by 
which  information  is  sought  for  the 
guidance  of  future  action. 

The  neurodynamics  of 
intentionality  in  the  process  of 
observation 

The  first  step  in  pursuit  of  an 
understanding  of  intentionality  is  to  ask, 
what  happens  in  brains  during  an  act  of 
observation?  This  is  not  a  passive  receipt 
of  information  from  the  world.  It  is  a 
purposive  action  by  which  an  observer 
directs  the  sense  organs  toward  a  selected 
aspect  of  the  world  and  interprets  the 
resulting  barrage  of  sensory  stimuli.  The 
concept  of  intentionality  has  been  used  to 
describe  this  process  in  different 
contexts,  since  its  first  use  by  Aquinas 
700  years  ago.  The  three  salient 
characteristics  of  intentionality  as  it  is 
treated  here  are  (a)  intent  or  directedness 
toward  some  future  state  or  goal,  (b) 
wholeness,  and  (c)  unity  (Freeman 
1995).  These  three  aspects  correspond  to 
use  of  the  term  in  psychology  with  the 
meaning  of  purpose,  in  medicine  with  the 
meaning  of  mode  of  healing  and 
integration  of  the  body,  and  in  analytic 
philosophy  with  the  meaning  of  the  way 
in  which  beliefs  and  thoughts  are 


connected  with  ("about")  objects  and 
events  in  the  world. 

Intent  comprises  the  endogenous 
initiation,  construction,  and  direction  of 
behavior  into  the  world.  It  emerges  from 
brains.  Humans  and  animals  select  their 
own  goals,  plan  their  own  tactics,  and 
choose  when  to  begin,  modify,  and  stop 
sequences  of  action,  and  humans  at  least 
are  subjectively  aware  of  themselves 
acting.  Unity  appears  in  the  combining 
of  input  from  all  sensory  modalities  into 
Gestalten,  in  the  coordination  of  all  parts 
of  the  body,  both  musculoskeletal  and 
autonomic,  into  adaptive,  flexible,  yet 
focused  movements.  Subjectively,  unity 
appears  in  the  awareness  of  self. 
Wholeness  is  revealed  by  the  orderly 
changes  in  the  self  and  its  behavior  that 
constitute  the  development  and  maturation 
of  the  self,  within  the  constraints  of  its 
genes  and  its  material,  social  and  cultural 
environments.  Subjectively,  wholeness 
is  revealed  in  the  remembrance  of  self 
through  a  lifetime  of  change. 

The  limbic  system  is  the  organ  of 
intentional  behavior 

Brain  scientists  have  known  for  over  a 
century  that  the  necessary  and  sufficient 
part  of  the  vertebrate  brain  to  sustain 
minimal  intentional  behavior  is  the  ventral 
forebrain,  including  those  components 
that  comprise  the  external  shell  of  the 
phylogenetically  oldest  part  of  the 
forebrain,  the  paleocortex,  and  the  deeper 
lying  nuclei  with  which  the  cortex  is 
connected.  These  components  suffice  to 
support  remarkably  adept  patterns  of 
intentional  behavior,  in  dogs  after  all  the 
newer  parts  of  the  forebrain  have  been 
surgically  removed  (Goltz  1892),  and  in 
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rats  with  neocortex  chemically  inactivated 
by  spreading  depression  (Bures  et  al. 
1974).  Intentional  behavior  is  severely 
altered  or  absent  after  major  damage  to 
the  basal  forebrain,  as  manifested  most 
clearly  in  Alzheimer's  disease. 

Phylogenetic  evidence  comes  from 
observing  intentional  behavior  in 
salamanders,  which  have  the  simplest  of 
the  existing  vertebrate  forebrains  (Herrick 
1948;  Roth  1987).  The  three  parts  are 
sensory  (which,  as  in  small  mammals,  is 
predominantly  olfactory),  motor,  and 
associational.  The  latter  part  contains  the 
primordial  hippocampus  with  its 
associated  septoamygdaloid  and  striatal 
nuclei,  which  are  identified  in  higher 
vertebrates  as  the  locus  of  the  functions 
of  spatial  orientation  (the  "cognitive 
map")  and  temporal  integration  in 
learning  (the  organization  of  long  and 
short  term  memory).  These  processes  are 
essential,  inasmuch  as  intentional  action 
takes  place  into  the  world,  and  even  the 
simplest  action,  such  as  searching  for 
food  or  evading  predators,  requires  an 
animal  to  know  where  it  is  with  respect  to 
its  world,  where  its  prey  or  refuge  is,  and 
what  its  spatial  and  temporal  progress  is 
during  sequences  of  attack  or  escape . 

Neurodynamics  of  intentionality 

The  crucial  question  for  neuroscientists 
is,  how  are  the  patterns  of  neural  activity 
that  sustain  intentional  behavior  created  in 
brains?  The  answer  is  provided  by 
studies  of  electrical  activity  of  the  primary 
sensory  cortices  of  animals  that  trained  to 
respond  to  conditioned  stimuli  (Freeman 
1975,  1992,  1995;  Barrie  et  al.  1996; 
Kay  et  al.  1996).  Cortical  neurons  are 
selectively  activated  by  sensory  receptors 
to  generate  microscopic  activity.  By 
interactions  among  the  cortical  neurons  a 
population  forms  that  "binds"  their 
activity  into  a  macroscopic  pattern  (Haken 
1983;  Gray  1994;  Hardcastle  1994; 
Singer  and  Gray  1995).  The  brain 
activity  patterns  that  are  seen  in 
electroencephalograms  (EEGs)  reveal  the 
macroscopic  brain  states  that  are  triggered 
or  induced  by  the  arrival  of  stimuli. 


These  brain  states  are  not  representations 
of  stimuli,  nor  are  they  the  simple  effects 
caused  by  stimuli.  Each  learned  stimulus 
serves  to  elicit  the  construction  of  a 
pattern  that  is  shaped  by  the  synaptic 
modifications  among  cortical  neurons 
from  prior  learning,  and  also  by  the  brain 
stem  nuclei  that  bathe  the  forebrain  in 
neuromodulator  chemicals.  It  is  a 
dynamic  action  pattern  that  creates  and 
carries  the  meanings  of  stimuli  for  the 
animal.  It  reflects  the  individual  history, 
present  context,  and  expectancy, 
corresponding  to  the  unity  and  the 
wholeness  of  the  intentionality.  The 
patterns  created  in  each  cortex  are  unique 
to  each  animal.  All  sensory  cortices 
transmit  their  signals  into  the  limbic 
system,  where  they  are  integrated  with 
each  other  over  time,  and  the  resultant 
integrated  meaning  is  transmitted  back  to 
the  cortices  in  the  processes  of  selective 
attending,  expectancy,  and  the  prediction 
of  future  inputs. 

The  same  kinds  of  EEG  activity  as  those 
found  in  the  sensory  and  motor  cortices 
are  found  in  various  parts  of  the  limbic 
system.  This  discovery  indicates  that  the 
limbic  system  also  has  the  capacity  to 
create  its  own  spatiotemporal  patterns  of 
neural  activity.  They  are  related  to  past 
experience  and  convergent  multisensory 
input,  but  they  are  self-organized.  The 
limbic  system  provides  a  neural  matrix  of 
interconnections,  that  serves  to  generate 
continually  the  neural  activity  that  forms 
goals  and  directs  behavior  toward  them. 
EEG  evidence  shows  that  the  process 
occurs  in  discontinuous  steps,  like  frames 
in  a  motion  picture.  Each  step  follows  a 
dynamic  state  transition,  in  which  a 
complex  assembly  of  neuron  populations 
jumps  suddenly  from  one  spatiotemporal 
pattern  to  the  next,  as  the  behavior 
evolves.  Being  intrinsically  unstable,  the 
limbic  system  continually  transits  across 
states  that  emerge,  spread  into  other  parts 
of  the  brain,  and  then  dissolve  to  give  rise 
to  new  ones.  Its  output  controls  the  brain 
stem  nuclei  that  serve  to  regulate  its 
excitability  levels,  implying  that  it 
regulates  its  own  neurohumoral  context, 
enabling  it  to  respond  with  equal  facility 
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to  changes  that  call  for  arousal  and 
adaptation  or  rest  and  recreation,  both  in 
the  body  and  the  environment.  It  is  the 
neurodynamics  of  the  limbic  system, 
assisted  by  other  parts  of  the  forebrain 
such  as  the  frontal  lobes,  that  initiates  the 
novel  and  creative  behavior  seen  in  search 
by  trial  and  error. 

The  limbic  activity  patterns  of  directed 
arousal  and  search  are  sent  into  the  motor 
systems  of  the  brain  stem  and  spinal 
cord.  Simultaneously,  patterns  are 
transmitted  to  the  primary  sensory 
cortices,  preparing  them  for  the 
consequences  of  motor  actions.  This 
process  has  been  called  "reafference" 
(von  Hoist  and  Mittelstaedt  1950; 
Freeman  1995),  "corollary  discharge" 
(Sperry  1950),  "focused  arousal"  (Sheer 
1989),  and  "preafference"  (Kay  et  al. 
1996).  It  sensitizes  sensory  systems  to 
anticipated  stimuli  prior  to  their  expected 
times  of  arrival  Sensory  cortical 
constructs  consist  of  brief  staccato 
messages  to  the  limbic  system,  which 
convey  what  is  sought  and  the  result  of 
the  search.  After  multisensory 
convergence,  the  spatiotemporal  activity 
pattern  in  the  limbic  system  is  up-dated 
through  temporal  integration  in  the 
hippocampus.  Between  sensory 
messages  there  are  return  up-dates  from 
the  limbic  system  to  the  sensory  cortices, 
whereby  each  cortex  receives  input  that 
has  been  integrated  with  the  output  of  the 
others,  reflecting  the  unity  of 
intentionality.  Everything  that  a  human 
or  an  animal  knows  comes  from  this 
iterative  circular  process  of  action, 
reafference,  perception,  and  up-date.  It  is 
done  by  successive  frames  that  involve 
repeated  state  transitions  and  self- 
organized  constructs  in  the  sensory  and 
limbic  cortices.  This  neurodynamic 
system  is  defined  here  as  the  "limbic  self1 
in  the  brain  of  an  individual,  where 
intentional  behavior  is  created,  with  help 
from  other  parts  of  the  forebrain. 

An  act  of  observation  comprises  Aquinas' 
intentional  action  of  "stretching  forth"  and 
learning  from  the  consequences,  and  the 
existential  "action-perception  cycle"  of 


Merleau-Ponty  (1942).  It  corresponds  to 
Piaget's  (1930)  cycle  of  "action, 
assimilation,  and  adaptation"  in  the 
sensorimotor  stage  of  childhood 
development.  His  postulated  sequences 
of  equilibrium,  disequilibrium,  and  re- 
equilibration  conform  to  state  transitions 
in  brain  dynamics,  which  initiate  and 
sustain  action,  construct  dynamic  patterns 
in  the  sensory  cortices,  and  up-date  the 
limbic  patterns  by  modifying  synapses  in 
the  learning  that  follows  the  sensory 
consequences  of  intended  actions.  For 
Piaget,  cause  and  effect  are  chains  of 
events  that  have  the  appearance  of  linkage 
corresponding  to  the  unfolding 
experience  of  that  exploration,  by  which  a 
child  is  trying  to  make  sense  of  its  world 
by  manipulating  objects  in  it.  The  origin 
of  causal  inference  is  buried  deeply  in  the 
pre-linguistic  exploratory  experience  of 
each  of  us.  It  is  not  easily  accessed  by 
cognitive  analysis  or  introspection. 

We  are  all  aware  of  our  acts  of 
observation.  It  is  partly  by  expectation  of 
what  we  are  looking  for  through 
reafference,  partly  by  perceiving  the 
changes  that  our  actions  make  in  the 
dispositions  of  our  bodies  through 
proprioception,  and  partly  by  our 
selection  of  stimuli  from  the  environment 
through  exteroception.  We  perceive  our 
intentional  acts  as  the  "causes"  of  changes 
in  our  perceptions,  and  the  subsequent 
changes  in  our  bodies  as  "effects" 
(Freeman  1995).  If  this  hypothesis  of 
limbic  dynamics  is  correct,  then 
everything  that  we  know  we  have  learned 
through  the  action-perception  cycle,  and 
the  iterative  state  changes  by  which  it  is 
produced  in  brains. 

Characteristics  of  brain  states 

The  "state"  of  the  brain  is  a  description  of 
what  it  is  doing  in  some  specified  time 
period.  A  state  transition  occurs  when 
the  brain  changes  and  does  something 
else.  For  example,  locomotion  is  a  state, 
within  which  walking  is  a  rhythmic 
pattern  of  activity  that  involves  large  parts 
of  the  brain,  spinal  cord,  muscles  and 
bones.  The  entire  neuromuscular  system 
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changes  almost  instantly  with  the 
transition  to  a  pattern  of  jogging  or 
running.  Similarly,  a  sleeping  state  can 
be  taken  as  a  whole,  or  divided  into  a 
sequence  of  slow  wave  and  REM  stages. 
Transit  to  a  waking  state  can  occur  in  a 
fraction  of  a  second,  whereby  the  entire 
brain  and  body  shift  gears,  so  to  speak. 
The  state  of  a  neuron  can  be  described  as 
active  and  firing  or  as  silent,  with  sudden 
changes  in  the  firing  manifesting  state 
transitions.  Populations  of  neurons  also 
have  a  range  of  states,  such  as  slow 
wave,  fast  activity,  seizure,  or  silence. 
The  science  of  dynamics  is  designed  to 
study  states  and  their  transitions. 

The  most  critical  question  to  ask  about  a 
state  is  its  degree  of  stability  or  resistance 
to  change.  Evaluation  is  done  by 
perturbing  an  object  or  a  system 
(Freeman  1975).  For  example,  an  object 
like  an  egg  on  a  flat  surface  is  unstable, 
but  a  coffee  mug  is  stable.  A  person 
standing  on  a  moving  bus  and  holding  on 
to  a  railing  is  stable,  but  someone 
walking  in  the  aisle  is  not.  If  a  person 
regains  his  chosen  posture  after  each 
perturbation,  no  matter  in  which  direction 
the  displacement  occurred,  that  state  is 
regarded  as  stable,  and  it  is  said  to  be 
governed  by  an  attractor.  This  is  a 
metaphor  to  say  that  the  system  goes  ("is 
attracted")  to  the  state  through  an  interim 
state  of  transiency.  The  range  of 
displacement  from  which  recovery  can 
occur  defines  the  basin  of  attraction,  in 
analogy  to  a  ball  rolling  to  the  bottom  of  a 
bowl.  If  the  perturbation  is  so  strong  that 
it  causes  concussion  or  a  broken  leg,  and 
the  person  cannot  stand  up  again,  then  the 
system  has  been  placed  outside  the  basin 
of  attraction,  and  a  new  state  supervenes 
with  its  own  attractor  and  basin. 

Stability  is  always  relative  to  the  time 
duration  of  observation  and  the  criteria 
for  what  is  chosen  to  be  observed.  In  the 
perspective  of  a  lifetime,  brains  appear  to 
be  highly  stable,  in  their  numbers  of 
neurons,  their  architectures  and  major 
patterns  of  connection,  and  in  the  patterns 
of  behavior  they  produce,  including  the 
character  and  identity  of  the  individual 


that  can  be  recognized  and  followed  for 
many  years.  Brains  undergo  repeated 
transitions  from  waking  to  sleeping  and 
back  again,  coming  up  refreshed  with  a 
good  night  or  irritable  with  insomnia,  but 
still,  giving  the  same  persons  as  the  night 
before.  Personal  identity  is  usually  quite 
stable.  But  in  the  perspective  of  the  short 
term,  brains  are  highly  unstable. 
Thoughts  go  fleeting  through  awareness, 
and  the  face  and  body  twitch  with  the 
passing  of  emotions.  Glimpses  of  their 
internal  states  of  neural  activity  reveal 
patterns  that  are  more  like  hurricanes  than 
the  orderly  march  of  symbols  in  a 
computer.  Brain  states  and  the  states  of 
populations  of  neurons  that  interact  to 
give  brain  function,  are  highly  irregular  in 
spatial  form  and  time  course.  They 
emerge,  persist  for  a  small  fraction  of  a 
second,  men  disappear  and  are  replaced 
by  other  states. 

In  using  dynamics  we  approach  the 
problem  by  defining  three  kinds  of  stable 
state,  each  with  its  type  of  attractor.  The 
simplest  is  the  point  attractor.  The 
system  is  at  rest  unless  perturbed,  and  it 
returns  to  rest  when  allowed  to  do  so.  As 
it  relaxes  to  rest,  it  has  the  history  of  what 
happened,  but  that  history  is  lost  after 
convergence  to  rest.  Examples  of  point 
attractors  are  silent  neurons  or  neural 
populations  that  have  been  isolated  from 
the  brain,  and  also  the  brain  that  is 
depressed  into  inactivity  by  injury  or  a 
strong  anesthetic,  to  the  point  where  the 
EEG  has  gone  flat.  A  special  case  of  a 
point  attractor  is  noise.  This  state  is 
observed  in  populations  of  neurons  in  the 
brain  of  a  subject  at  rest,  with  no 
evidence  of  overt  behavior.  The  neurons 
fire  continually  but  not  in  concert  with 
each  other.  Their  pulses  occur  in  long 
trains  at  irregular  times.  Knowledge 
about  the  prior  pulse  trains  from  each 
neuron  and  those  of  its  neighbors  up  to 
the  present  fails  to  support  the  prediction 
of  when  the  next  pulse  will  occur.  The 
state  of  noise  has  continual  activity  with 
no  history  of  how  it  started,  and  it  gives 
only  the  expectation  that  its  amplitude  and 
other  statistical  properties  will  persist 
unchanged. 
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A  system  that  gives  periodic  behavior  is 
said  to  have  a  limit  cycle  attractor.  The 
classic  example  is  the  clock.  When  it  is 
viewed  in  terms  of  its  ceaseless  motion,  it 
is  regarded  as  unstable  until  it  winds 
down,  runs  out  of  power,  and  goes  to  a 
point  attractor.  If  it  resumes  its  regular 
beat  after  it  is  re-set  or  otherwise 
perturbed,  it  is  stable  as  long  as  its  power 
lasts.  Its  history  is  limited  to  one  cycle, 
after  which  there  is  no  retention  of  its 
transient  approach  in  its  basin  to  its 
attractor.  Neurons  and  populations  rarely 
fire  periodically,  and  when  they  appear  to 
do  so,  close  inspection  shows  that  the 
activities  are  in  fact  irregular  and 
unpredictable  in  detail,  and  when  periodic 
activity  does  occur,  it  is  either  intentional, 
as  in  rhythmic  drumming,  or 
pathological,  as  in  nystagmus  and 
Parkinsonian  tremor. 

The  third  type  of  attractor  gives  aperiodic 
oscillation  of  the  kind  that  is  observed  in 
recordings  of  EEGs.  There  is  no  one  or 
small  number  of  frequencies  at  which  the 
system  oscillates.  The  system  behavior  is 
therefore  unpredictable,  because 
performance  can  only  be  projected  far 
into  the  future  for  periodic  behavior. 
This  type  was  first  called  "strange";  it  is 
now  widely  known  as  "chaotic".  The 
existence  of  this  type  of  oscillation  was 
known  to  Poincare"  a  century  ago,  but 
systematic  study  was  possible  only 
recently  after  the  full  development  of 
digital  computers.  The  best  known 
simple  systems  with  chaotic  attractors 
have  a  small  number  of  components  and  a 
few  degrees  of  freedom,  as  for  example, 
the  double-hinged  pendulum,  and  the 
dripping  faucet.  Large  and  complex 
systems  such  as  neurons  and  neural 
populations  are  thought  to  be  capable  of 
chaotic  behavior,  but  proof  is  not  yet 
possible  at  the  present  level  of 
developments  in  mathematics. 

The  discovery  of  chaos  has  profound 
implications  for  the  study  of  brain 
function  (Skarda  and  Freeman  1987).  A 
chaotic  system  has  the  capacity  to  create 
novel  and  unexpected  patterns  of  activity. 


It  can  jump  instantly  from  one  mode  of 
behavior  to  another,  which  manifests  the 
facts  that  it  has  a  collection  of  attractors, 
each  with  its  basin,  and  that  it  can  move 
from  one  to  another  in  an  itinerant 
trajectory  (Tsuda  1996).  It  retains  in  its 
pathway  across  its  basins  its  history, 
which  fades  into  its  past,  just  as  its 
predictability  into  its  future  decreases. 
Transitions  between  chaotic  states 
constitute  the  dynamics  we  need  to 
understand  how  brains  do  what  they  do. 

The  cortical  state  transition  is 
the  elemental  step  of 
intentionality 

Systems  such  as  neurons  and  brains  that 
have  multiple  chaotic  attractors  also  have 
point  and  limit  attractors.  A  system  that 
is  in  the  basin  of  one  of  its  chaotic 
attractors  is  legendary  for  the  sensitivity 
to  what  are  called  the  "initial  conditions". 
This  refers  to  the  way  in  which  a  simple 
system  is  placed  into  the  basin  of  one  of 
its  attractors.  If  the  basin  is  that  of  a 
point  or  a  limit  cycle  attractor,  the  system 
proceeds  predictably  to  the  same  end 
state.  If  the  basin  leads  to  a  chaotic 
attractor,  the  system  goes  into  ceaseless 
fluctuation,  as  long  as  its  energy  lasts.  If 
the  starting  point  is  identical  on  repeated 
trials,  which  can  only  be  assured  by 
simulation  of  the  dynamics  on  a  digital 
computer,  the  same  aperiodic  behavior 
appears.  This  is  why  chaos  is  sometimes 
called  "deterministic".  If  the  starting 
point  is  changed  by  an  arbitrarily  small 
amount,  although  the  system  is  still  in  the 
same  basin,  the  trajectory  is  not  identical. 
If  the  difference  in  starting  conditions  is 
too  small  to  be  originally  detected,  it  can 
be  inferred  from  the  unfolding  behavior 
of  the  system,  as  the  difference  in 
trajectories  becomes  apparent.  This 
observation  shows  that  a  chaotic  system 
has  the  capacity  to  create  information  in 
the  course  of  continually  constructing  its 
own  trajectory  into  the  future. 

In  each  sensory  cortex  there  are  multiple 
basins  corresponding  to  previously 
learned  classes  of  stimuli,  as  well  as  to 
the  unstimulated  state.    This  chaotic 
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prestimulus  state  of  expectancy 
establishes  the  sensitivity  of  the  cortex, 
so  that  the  very  small  number  of  sensory 
action  potentials  evoked  by  an  expected 
stimulus  can  carry  the  cortical  trajectory 
into  the  basin  of  an  appropriate  attractor. 
The  stimulus  is  selected  by  the  limbic 
brain  through  orientation  of  the  sensory 
receptors  by  sniffing,  looking,  and 
listening.  The  basins  of  attraction  are 
shaped  by  limbic  input  to  sensitize  the 
reception  of  a  desired  class  of  stimuli. 
The  web  of  synaptic  connections  that  was 
modified  by  prior  learning  contributes  to 
the  formation  of  basins  and  to  the 
attractors.  This  is  an  act  of  observation. 
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ABSTRACT 

Computer  hypermedia  technologies  offer  significant 
possibilities  for  integrating  data,  information  and 
multifaceted  knowledge  resources  abounding  in  existing 
and  next  generation  plant  operations.  A  hypermedia 
system  may  be  viewed  as  a  set  of  nodes  and  links 
allowing  non-linear  access  to  plant  information  residing 
in  computers  regardless  of  format.  The  process  of 
accessing  information  in  hypermedia  systems  is  known 
as  navigation.  After  reviewing  the  state  of  the  art  we 
present  quantitative  criteria  for  the  development  of 
hypermedia  databases  and  a  fuzzy  graph-based 
methodology  for  navigating  the  large  information  spaces 
involved  in  nuclear  plant  operations.  In  the  developed 
methodology  membership  functions  embodying  context- 
dependent  criteria  provide  application-specific  tools  for 
navigation.  The  methodology  is  illustrated  through 
numerical  examples  and  a  HyperCard-based  prototypical 
system  for  monitoring  special  material  in  a  next 
generation  plant. 

KEYWORDS:  hypermedia  databases,  fuzzy  graphs, 
navigation  systems,  information  systems 

1.  INTRODUCTION 

Among  the  various  aspects  of  plant  operations, 
the  maintenance  and  upgrade  of  stored  records  and 
information  resources  is  probably  the  least  urgent  on  a 
daily  basis,  yet  one  of  the  most  important  contributors  to 
life  long  performance  and  efficiency.  Indeed  the 
outcome  of  records  and  other  information  resources 
maintenance  and  utilization  has  far  reaching  implications 
for  the  efficiency  of  operations,  the  integrity  of  safety 
systems,  the  effectiveness  of  training  and  the  overall 
performance  of  plant  technical,  operations  and 
managerial  personnel.  Hypermedia  and  the  related  field 
of  virtual  environments,  once  thought  of  as  science 
fiction-like  technologies,  are  opening  new  ways  for 
improving  a  variety  of  tasks  including,  but  not  limited  to, 
enhancement  of  training,  outage  planning,  and 
component  inspections  in  hard  to  reach  areas.1,2  In 
essence,  hypermedia  allows  for  a  more  flexible  encoding 
and  utilization  of  plant  records  and  human  expertise 
through  the  integration  of  textual,  visual  and  audible 
data,  with  a  variety  of  plant  information  and  knowledge 
resources  residing  in  networked  computers. 


It  is  technically  feasible  and  possibly 
advantageous  that  records  such  as  design  drawings, 
scientific  and  engineering  calculations,  safety  analysis 
reports,  technical  specifications,  purchasing  and 
accounting  data,  and  audit  reports,  although  stored  in 
different  formats  and  platforms,  can  be  integrated  and 
transparently  available  to  users  having  a  diversity  of 
backgrounds  and  plant  functions.  For  example,  through 
a  plant  hypermedia  system,  a  safety  system  may  be 
viewed  not  only  as  a  physical  object,  but  a  programmable 
computer  object  possessing  a  number  of  distinct,  yet 
equally  valid  representations,  e.g.,  relations  with 
neighboring  objects  (pipes,  valves  etc.),  technical 
specifications,  manufacturing  data,  photos,  maintenance 
records,  location  video,  or  acoustic  data  signatures. 

The  driving  force  for  hypermedia  integration 
comes  from  the  rapid  ongoing  advances  in  distributed 
(networked)  computer  systems  transparently  sharing 
memory  and  software  resources.  Computer  storage  in 
the  order  of  Gigabyte  (109  bytes)  and  network 
communication  speeds  in  the  order  of  Mbauds  (106 
bits/sec)  are  at  present  readily  available;  the  world  wide 
web  and  hypermedia  browsers  such  as  Mosaic,  Netscape 
and  Internet  Explorer  are  familiar  tools  in  the  vast  global 
networks  of  computers  opening  new  commercial 
activities  around  the  globe.  Hence,  it  is  now  apparent 
that  distributed  hypermedia  databases  will  be  of  great 
benefit  in,  amongst  others,  tracking  the  entire  history  of 
plant  components,  enhancing  training,  improving 
planning  and  the  overall  conduct  of  plant  operations. 

In  hypermedia  information  is  organized  as 
networks  of  nodes  connected  by  links.  A  node  is  the 
smallest  unit  of  meaningful  information  to  which  a 
hypermedia  link  is  made.  For  example,  a  piece  of  text,  a 
photograph,  a  computer  screen,  a  2-minute  video 
segment,  and  a  maintenance  record  of  an  emergency 
diesel  generator,  can  all  be  thought  of  as  nodes.  Nodes 
may  contain  text,  graphics,  audio,  video  and  software  for 
operating  on  numerical  and/or  symbolic  data.  A  link  is  a 
connection  between  information  stored  in  two  different 
nodes.  A  group  of  links  that  have  related  functions  is 
referred  to  as  a  link  family,  and  most  hypermedia  systems 
feature  link  buttons  (link  icons)  which  can  be  arbitrarily 
embedded  within  the  content  material. 

Yet,  while  it  is  feasible  to  store  in  hypermedia 
databases,  a  multitude  of  design,  maintenance  and 
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operations  data,  as  well  as  plant  information  and 
knowledge  to  an  unprecedented  extent,  user 
disorientation  and  possibly  confusion  become  the 
limiting  factors  in  their  efficient  utilization.  Increasing 
the  number  of  connections,  or  links,  increases  the 
possibility  that  a  user  will  get  lost  in  irrelevant 
information;  a  situation  proverbially  known  as  the  "lost 
in  hyperspace  problem."  User  disorientation  may  be 
particularly  severe  for  large  scale  applications  such  as 
those  involved  in  nuclear  plants  and  given  the  critical 
nature  or  many  plant  operations,  it  may  actually  be  a 
prohibitive  bottleneck  as  far  as  the  field  deployment  of 
the  technology  is  concerned. 

2.  REVIEW  OF  HYPERMEDIA 

Hypermedia  allows  for  easy  and  intuitive  access 
to  documents  and  programs  by  linking  dispersed  yet 
interrelated  information  throughout  a  document,  a 
program  or  a  series  of  documents  and  programs. 
Conventional  information  structures  use  a  hierarchical  or 
sequential  logic,  i.e.,  there  is  a  single  linear  sequence 
defining  the  order  in  which  text  is  to  be  accessed.  The 
typical  design  report  of  a  plant  provides  an  example  of  a 
hierarchical  structure.  The  report  has  several  chapters; 
each  chapter  is  broken  to  several  numbered  sections; 
each  section  has  one  or  more  numbered  subsections;  each 
subsection  may  have  several  numbered  sub-subsections 
and  so  on.  Figure  1  illustrates  the  way  information  is 
structured  in  Chapter  10  Materials  Handling  System  of  a 
Plant  Design  Report.  To  access  and  understand  10,3 
Plant  Product,  we  have  to  proceed  from  70.2 
Processing,  and  to  understand  the  description  of  the 
70.2.2  Waste  Safety  Evaluation,  one  has  to  proceed  from 
70.2.7  Waste  Product  Descriptions  through  10.2.1.1, 
10.2.1.2  and  10.2.1.3,  solid,  liquid,  and  gaseous  wastes 
respectively.  There  are  two  special  articles  (shown 
within  the  circular  nodes)  in  the  information  structure  of 
figure  1:  the  Table  of  Contents  (TOQ  and  the  References 
(REFS).  TOC  may  be  considered  an  index  node  (a  node 
pointing  to  other  nodes),  while  REFS  may  be  a  reference 
node  (a  node  pointed  to  by  several  other  nodes). 

Contrary  to  the  hierarchical  structure  of 
conventional  information  structures,  hypermedia 
information  is  structured  in  a  non-sequential  or 
associational  manner.  An  associational  information 
structure  allows  a  user  to  go  from  a  node  to  any  other 
node  directly  or  through  intermediate  nodes  without 
having  to  observe  the  linear  sequence  implicit  in  figure  1. 
Hypermedia  presents  several  options  to  the  user  who 
determines  which  one  of  them  to  follow  largely  based  on 
specific  informational  needs.  This  nonsequential 
organization  enables  one  to  store  and  retrieve 
information  in  a  more  flexible  manner. 

Research  and  development  in  the  field  of 
hypermedia  have  made  considerable  progress  in  the  past 
decade,  steadily  moving  towards  more  complex 
applications  including,  but  not  limited  to,  pedagogical 


projects  in  numerous  fields,6-7,8'9  managing  and 
presenting  documentation,10,11  learning  and  classroom 
applications,  '  information  retrieval  and  indexing, 
industrial  applications,1'16  electronic  journal  publishing,17 
and  mass  media  applications.1819,2  21'22'23  in  several 
successful  projects,  hypermedia  systems  integrating  an 
impressive  corpus  of  material  related  to  academic  fields 
of  studies  and  encyclopedic  knowledge  have  been 
developed.  Results  from  a  number  of  evaluations  in 
different  learning  environments  have  demonstrated  that 
hypermedia  posses  an  important  edge  for  accelerating 
learning  and  for  supporting  new  types  of  learning  and 
teaching  (see  ref.  6). 

In  Germany,  hypermedia  systems  supporting 
training  are  currently  in  the  process  of  being  integrated 
into  existing  training  systems  at  several  Power  Stations. 
Since  1994,  various  hypermedia  systems  have  been 
developed  and  applied  to  Computer  Based  Training 
(CBT)  for  plant  personnel  in  subjects  such  as, 
thermodynamics,  electrical  engineering,  and  operations 
in  specific  plants.  For  example,  a  hypermedia  learning 
program  about  water  hammers  in  pipes  integrating  text, 
video,  animation,  graphics  and  sound  has  been  developed 
and  reported  in  ref.  1.  The  system  very  clearly 
demonstrates  to  users  how  a  high  wave  blocks  the  cross- 
section  of  a  pipe  and  the  resulting  liquid  plug  is 
accelerated  by  the  flow  of  steam.  Video  sequences 
provide  laboratory  visualization  of  the  phenomenon 
occurring  in  a  glass  pipe  and  photographs  of  pipes 
damaged  by  plug  acceleration  make  direct  connection 
with  actual  work  situations  and  point  to  the  significance 
of  the  problem  in  malfunctions  that  have  actually 
occurred  in  several  plant  environments.  The  authors  of 
the  system  have  included  a  large  body  of  fundamental 
scientific  knowledge  and  technical  expertise  within  the 
system  including  possible  ways  of  preventing  damage, 
user  access  to  system  diagrams  and  operating  manuals. 

Hypermedia  systems  feature  flexible 
information  structures  and  offer  great  browsing  freedom 
for  the  users.  Yet,  the  advantages  of  flexibility  and 
freedom  come  with  a  potential  cost:  user  disorientation 
and  confusion  or  what  is  popularly  referred  to  as  the 
problem  of  being  "lost  in  hyperspace."  Previous 
attempts  to  address  this  problem  have  concentrated  on 
improving  the  user  interface  through  multiple  windows, 
maps,  and  tours  or  path  mechanisms.  Unfortunately, 
multiple  windows  help  only  in  a  limited  localized  way, 
and  maps  are  hard  to  design  and  maintain  in  nuclear 
power  plant  applications  when  hundreds  or  thousands  of 
nodes  may  be  involved. 

Another  attempt  to  address  the  user 
disorientation  issue  is  textual  analysis,  where  the 
designers  of  the  hypermedia  system  statistically  analyze 
word  and  concept  frequencies  in  order  to  index  nodes  by 
significant  terms,  facilitating  retrieval  and  navigation 
among  relevant  documents.24  Textual  analysis  provides 
only  partial  solution  to  the  disorientation  problem, 
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however,  for  it  is  important  for  readers  to  individually 
grasp  the  information  structure  by  utilizing  their  own 
criteria  and  respond  to  unique  application  needs.  Given 
the  diversity  of  user  backgrounds  and  application 
requirements,  one  possible  way  to  achieve  this  appears  to 
be  through  fuzzy  modeling. 

Information-based  measures  may  be  used  to 
categorize  the  links  of  a  hypermedia  system  in  a  number 
of  different  ways.25  One  scheme  of  classification  (see 
ref.  3)  divides  all  links  into:  relative  position  links,  hard 
links  and  return  links.  Relative  position  links  connect 
two  nodes  a  pre-defined  distance  apart.  Examples  of 
such  links  are  the  "Go  Next"  or  "Go  Previous" 
commands  allowing  a  user  to  go  forward/backward 
through  one  node  at  the  time.  Hard  links,  on  the  other 
hand,  are  connections  made  from  a  node  to  another 
specific  node,  such  as  for  example  linking  from  any  node 
to  a  common  map  or  a  home  page.  Hard  links  are  used 
in  conjunction  with  conditional  structures  (e.g., 
IF/THEN/ELSE)  to  control  branching  and  to  control  the 
display  of  hidden  fields  containing  information.  Finally, 
return  links  connect  users  to  a  destination  node  and  then 
take  them  back  to  where  the  link  began,  in  a  way  forcing 
them  to  return  to  the  point  of  departure.  Return  links  are 
useful  for  permitting  user  access  to  reference  nodes  and 
imposing  designer-controlled  constraints  on  where  users 
may  go  in  the  information  space. 

For  the  purpose  of  analyzing  a  hypermedia 
structure  and  facilitating  its  design  through  information 
measures,  it  is  useful  to  distinguish  between  hierarchical 
(organizational)  and  associational  (referential)  links. 
The  former  are  used  to  create  a  hierarchy  while  the  latter 
are  used  to  cross-reference  information. 

3.  HYPERMEDIA  NAVIGATION  USING  FUZZY 
GRAPHS 

In  this  section  we  develop  a  navigational 
methodology  for  hypermedia  on  the  basis  of  the 
generalized  theories  for  information  retrieval 26  27  and  the 
theory  of  fuzzy  graphs.28  Fuzzy  graphs  have  been  used 
in    connection    with    fault    diagnosis    in  nuclear 
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applications.  In  earlier  work  fuzzy  graphs  have  also 
been  used  in  connection  with  hypermedia  integration  and 
navigation  algorithms.30 

A  fuzzy  graph  is  defined  as  follows.   Let  E\ 

and  Ej  be  two  sets  and  let  XE  E-\  and  y  E  &i  ■  The 
Cartesian  product  of  the  two  sets  is  the  set  of  ordered 
pairs  Ei  X  E2  ■  A  fuzzy  graph  is  defined  as  the  fuzzy 
subset  G  such  that 

V(x,y)e  E)  xE2  :  He{x>y)^  M 

where,  JUq  (x,  y)  is  a  membership  function,  and  M  is 
the  membership  set  on  E\  X  E2 .    The  membership 


function  JJq  [X,  y j  is  a  mechanism  for  grading  each 

element  of  the  relation  represented  by  the  graph. 

A  fuzzy  graph  is  a  graph  where  the  links  are  not 
Boolean,  i.e.,  either  0  or  1,  but  graded  by  a  membership 
function  which  takes  values  in  the  interval  [  0,  1].  A 
hypermedia  system  may  be  viewed  as  a  fuzzy  graph 
where  different  nodes  are  connected  with  each  other 
through  some  graded  link  reflecting  application-specific 
or  in  general  user-specified  criteria.  Thus  the  navigation 
problem  in  the  information  hyperspace  is  translated  to  a 
problem  of  traversing  a  graph  in  accordance  with  some 
directional  aids  quantified  through  fuzzy  membership 
functions. 

4.  A  HYPERMEDIA  PROTOTYPE  FOR  THE 
MHTGR-NPR 

The  Special  Material  Accounting  System 
(SMAS)  is  a  hypermedia  prototype  designed  to  provide 
comprehensive  monitoring  and  surveillance  of  the  tritium 
produced  in  the  Modular  High  Temperature  Gas  Cooled 
New  Production  Reactor  (MHTGR-NPR).  The  objective 
of  the  system  is  to  facilitate  meeting  required  tritium 
accountability  criteria  to  within  10  gm  of  total  inventory. 
These  criteria  are  established  in  Plant  Design 
Requirements  Documents  along  with  the  overall 
performance,  functional,  interface,  operational,  safety, 
maintenance,  inspection  and  decommissioning 
requirements  for  the  plant.  An  advantage  of  a 
hypermedia  system  in  this  context  is  that  the  multitude  of 
data  and  information  involved  in  monitoring  the 
production  of  special  material  over  long  periods  of  time, 
can  be  integrated  with  textual,  visual  and  audible  data  in 
a  manner  that  facilitates  the  desired  accountability. 
SMAS  was  developed  in  the  HyperCard  environment 
which  includes  the  object-oriented  programming 
language  called  HyperTalk.  It  consists  of  three  major 
subsystems  referred  to  as  Target  Fabrication,  MHTGR- 
NPR  Production,  and  Target  Processing.  These 
subsystems  correspond  to  the  three  major  phases  in  the 
history  of  the  production  process,  i.e.,  the  fabrication 
phase,  the  production  phase  (target  inside  the  reactor), 
and  the  processing  phase.  Once  a  reliable,  tested  and 
validated  version  of  the  system  has  been  developed,  it 
could  possibly  be  integrated  with  the  Production 
Assurance  Protection  System  of  the  Plant. 

Figure  2  illustrates  the  overall  architecture  of 
the  Special  Material  Accounting  System.  SMAS  is 
designed  to  have  access  to  historical  as  well  as  current 
data,  and  plant  information  and  knowledge  as  well  as 
access  to  external  networks.  In  SMAS,  the  root  node  is 
an  entrance  stack  of  cards  linked  to  all  other  stacks. 
Buttons  that  link  to  a  data  acquisition  program  are 
indicated  by  the  button  "Data".  Buttons  that  link  to  local 
historical  data  are  indicated  by  "Information"  and  buttons 
that  include  textual  or  pictorial  data  are  indicated  by  the 
button  "Knowledge."  A  sequence  of  jumps  to  a  screen 
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containing  pictorial  information  about  the  target  elements 
is  achieved  by  pressing  the  appropriate  buttons.  The 
hypermedia  information  measures  outlined  in  sections  2 
and  3  were  used  for  the  development  of  the  system,  and 
the  fuzzy  graph  based  methodology  of  section  4  provides 
the  navigational  support. 

HyperCard  is  based  on  the  card  computational 
metaphor  where  nodes  can  be  either  a  card,  or  a 
collection  of  cards  is  called  a  stack.  Buttons  are 
constructed  on  the  screen  (through  a  HyperTalk 
program)  and  can  be  activated  when  a  user  clicks  on 
them.  Through  HyperTalk  however,  other  events  can 
trigger  button  actions;  for  example,  when  the  cursor 
enters  the  button  region  or  when  a  specified  time  has 
elapsed.  A  major  advantage  of  HyperCard  is  that  links 
do  not  need  to  be  hard-wired.  Through  HyperTalk 
anything  that  can  be  computed  can  be  used  as  the 
destination  for  a  link.  This  is  an  advantageous  feature 
for  the  purpose  of  integration.  Goto  statements  achieve 
hypermedia  jumps  or  links.  Commands  such  as  show 
and  hide  can  simulate  pop-up  windows. 

6.  SUMMARY 

Hypermedia  is  a  technology  that  takes 
advantage  of  the  phenomenal  availability  of  computer 
storage  and  networking  capabilities.  It  allows  for  the 
development  of  large  information  spaces,  a  possibility  of 
considerable  merit  for  nuclear  plant  operations.  While  it 
is  feasible  to  store  maintenance  and  operations  data, 
information  and  knowledge  to  an  unprecedented  extent, 
hypermedia  utilization  is  limited  by  factors  such  as  user 
disorientation  during  access  of  the  large  information 
spaces  involved.  To  overcome  this  problem  we  have 
presented  a  set  of  information  measures  primarily 
applicable  during  the  design  of  such  systems  and  a  fuzzy 
graph  methodology  pertinent  during  the  use  of  the 
system,  i.e.,  for  navigation.  The  development  of 
hypermedia  systems  where  information  can  be  retrieved 
and  utilized  to  satisfy  a  variety  of  needs,  requires  that  a 
user  can  navigate  through  enormous  hyperspaces  assisted 
by  tools  that  offer  context-specific  direction.  In  this 
paper  we  outlined  a  directional  tool  that  allows  a  user  to 
link  to  the  associated  nodes  in  accordance  with 
application-specific  criteria  embodied  in  membership 
functions.  Further  research  is  needed  to  experimentally 
test  and  verify  the  merit  of  such  approaches  in  actual 
environments.  An  example  of  a  hypermedia-based 
prototype  for  special-material  monitoring  was  presented. 
It  is  likely  that  navigational  tools  will  be  indispensable 
for  upscaling  development  and  large-size 
implementations  where  the  possibilities  opening  through 
hypermedia  for  improved  operations  in  nuclear  plants 
will  be  tested  and  realized. 
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Figure  1.  A  schematic  illustration  of  the  hierarchical  information  structure  of  Chapter  10. 
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Figure  2.  The  overall  structure  of  the  hypermedia  prototype  SMAS  integrating  data,  with  information  and 
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Abstract 

The  game  problem  on  pursuit  of  a  controlled  object,  mov- 
ing in  the  horizontal  plane,  by  another  one,  moving  in  the 
space,  is  treated  here.  In  this  connection  the  horizontal 
plane  may  be  considered  as  a  state  constraint  for  the  pur- 
suer. The  goal  of  the  pursuer  is  to  bring  together  geomet- 
ric coordinates  and  velocities  of  the  players  (soft  landing) 
at  some  finite  instant  of  time.  We  advance  the  method 
of  pursuit  for  solution  of  this  problem  which  consists  of 
three  stages.  On  the  first  one  the  pursuit  is  performed 
only  by  geometric  coordinates.  The  first  step  ends  at  so 
called  critical  instant  of  time  which  is  the  last  one  be- 
ginning from  which  the  pursuer  can  avoid  meeting  with 
the  horizontal  plane  at  nonzero  angle.  The  phase  states 
of  the  pursuer  and  the  evader  at  the  critical  instant  are 
then  fixed  and  on  the  second  stage  we  solve  the  problem 
of  optimal  control  on  the  transition  of  the  pursuer  into 
the  evader's  position  at  the  critical  time  with  velocity  ex- 
ceeding in  magnitude  and  coinciding  in  direction  with  the 
velocity  of  the  evader.  It  goes  without  saying  that  during 
the  transition  time  the  evader  leaves  his  initial  position. 
That  is  why  on  the  third  stage  the  pursuit  in  tracks  on  the 
horizontal  plane  is  performed  right  up  to  the  soft  landing 
with  the  further  holding  a  trajectory  in  the  conditions  of 
coincidence  of  geometric  coordinates  and  velocities  of  the 
players. 

Keywords  :  soft  landing,  differential  game,  set-valued 
mapping,  dynamic  system,  pursuit  in  tracks,  quasistrat- 
egy,  optimal  control. 


1  Introduction 

In  the  theory  of  differential  games  there  is  a  number  of 
constructive  methods  making  it  possible  to  solve  rather 
wide  classes  of  problems  under  various  information  con- 
ditions. Most  employed  as  strategies  are  strobostrophic 
[4,11],  positional  [5,19],  and  quasistrategies  [6,14,17].  For 
successful  termination  of  the  game  from  the  pursuer's 
point  of  view  certain  advantage  of  the  pursuer  in  control 
resources  is  needed.  Most  typical  and  frequently  employed 
kind  of  advantage  is  Pontryagin's  Condition  and  its  vari- 
ous modifications  [4,6,11,14,16,17,18]. 

The  specific  feature  of  the  pursuit  problem  on  soft  land- 
ing suggested  by  J.Albus  and  A.Meystel  [1]  as  "The  Eagle 
Snatch"  problem  is  that  for  this  problem  not  only  Pon- 
tryagin's Condition  but  also  all  its  typical  modifications 
[6,14]  fail.  The  sole  exception  is  the  analogue  of  Pontrya- 
gin's Condition  dealing  with  the  prolongation  of  pursuit 
time  and  the  allowance  for  a  memory  [18].  But  even  in 
this  case  the  constructing  of  efficient  pursuit  procedure  on 
the  base  of  one  of  the  above  mentioned  methods  is  rather 
difficult  problem. 

The  approach  advanced  in  this  paper  appears  as  a  com- 
bined one  that  corresponds  to  the  ideology  of  the  Artificial 
Intelligence  [2,3].  The  process  of  soft  landing  is  partitioned 
into  three  stages.  Such  procedure  seems  to  be  natural  and 
clear  from  the  physical  point  of  view.  It  goes  without  say- 
ing that  one  can  not  speak  about  optimality  of  the  time  of 
pursuit  though  the  behavior  of  the  pursuer  on  each  stage 
is  guided  by  the  considerations  of  optimality. 
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Nontheless,  the  conditions  on  parameters  of  the  conflict 
controlled  process  are  derived  ensuring  the  finiteness  of  the 
time  of  pursuit  for  any  initial  position.  Also  the  controls 
of  the  pursuer  ensuring  the  pursuer's  goal  are  given  in 
explicit  form. 

2    Problem  Statement 

A  motion  of  the  pursuer  in  three-dimensional  space  is  sub- 
ject to  the  equation 

x  +  ax  —  pu,  \\u\\  <  1,  (2-1) 

where  x  =  {x\,X2,x3)  are  geometric  coordinates  of  the 
object,  namely  x\,  x2  denote  coordinates  in  the  horizon- 
tal plane  and  x3  is  a  height.  Vectors  x  and  x  are  a  velocity 
and  an  acceleration  respectively,  integer  a  >  0  is  a  coeffi- 
cient of  friction,  p  >  0  is  a  force  coefficient,  u  is  a  control 
parameter  taking  its  values  in  the  unit  sphere,  centered 
at  the  origin  of  space  R3,  besides  function  u(t),  t  >  0,  is 
assumed  to  be  Lebesque  measurable  [20,24]. 

A  motion  of  the  evader  evolves  in  the  plane  in  the  fol- 
lowing way 

y  +  /3y  =  <rv,  IHI  <  1,  (2.2) 

where  y  =  (y\,  2/2),  y  and  y  are  a  velocity  and  an  acceler- 
ation, 0  >  0  is  a  friction  coefficient,  a  >  0  is  a  evader's 
control  parameter  taking  its  values  in  the  planar  sphere 
centered  at  zero  and  being  a  time  measurable  function.  In 
order  to  treat  y  as  a  vector  in  R3  we  shall  sometimes  write 
V  =  (yi,y2,0)  or  y  =  (y,0). 

We  shall  analyse  the  game  (2.1),  (2.2)  standing  on  the 
pursuer's  side.  He  should  perform  the  soft  meeting  with 
the  evader  at  some  finite  instant  of  time,  that  is,  at  this 
instant  the  inequalities  should  hold 

\\x  -  y\\  <  £1,  Wx-tiW  <  e2  (2.3) 

The  hyperplane  {y3  =  0}  stands  for  the  state  constraint 
for  the  pursuer.  The  pursuer  is  allowed  to  move  in  this 
hyperplane  not  interesting  it.  If  this  is  not  the  case  a  tra- 
jectory can  be  shifted  by  some  8  >  0  above  the  horizontal 
plane  and  further  considerations  would  be  valid  for  this 
case  too. 

Without  loss  of  generality  we  can  assume  that  x3  — 
£3(0)  >  0,  that  is,  at  the  initial  instant  of  time  the  pursuer 
finds  itself  in  the  upper  halfspace. 

In  the  sequel,  for  simplicity's  sake  we  shall  assume  that 
E\  —  £2  =  0,  that  is  the  precision  soft  landing  will  be 
treated.  The  passage  from  solution  of  this  problem  to  the 
problem  (2.3)  is  trivial,  and  what  is  more,  the  solution  of 
problem  (2.3)  immediately  follows  from  the  solution  of  the 
problem  on  the  precision  soft  landing. 

For  convenience  of  further  investigation  of  the  problem 
(2. 1  )-(2.3)  we  reduce  the  second  order  systems  (2.1),  (2.2) 


to  the  system  of  first  order  with  help  of  the  variables' 
substitution 

zi  -  x,  z2-  x,  z3  =  y,  24  =  y. 

Differentiating  in  t  the  above  equations  with  account  of 
(2.1),  (2.2)  we  come  to  the  equivalent  system 

z\    =  z2 

z2    =    -az2  +  pu 

h    =    Z4  (2.4) 

Z4      =      —  (3Z4  +  (TV 

From  the  formal  point  of  view  the  linear  system  (2.4) 
is  a  system  of  12-th  order  but  in  fact  only  of  10-th,  since 
vectors  Z\,  z-z,  Z3,  Z4  are  three-dimensional  and  the  two 
last  have  zeroth  third  components. 

The  problem  facing  the  pursuer  is  to  bring  a  trajec- 
tory of  system  (2.4)  from  given  initial  state  to  the  four- 
dimensional  subspace 

M0  :  zi  =  z3,  z2  =  24  (2.5) 

lying  in  the  direct  product  of  four  planes  for  any  admissi- 
ble counteractions  of  the  evader. 

The  condition  on  the  availability  of  information  to  the 
pursuer  in  the  course  of  the  game  will  be  further  specified 
during  the  solution  of  the  problem  at  hand. 

Denote  the  phase  states  of  the  players  as  follows 

x  =  (x,x)  =  (21,22),  V  =  (y,y)  =  (23,24) 
3    Rough  Method  of  Solution 

The  game  problem  (2.4),  (2.5)  is  the  linear  differential 
game.  The  matrix  of  system  (2.4)  has  the  form 


A  = 


(  o3 

E3 

o3 

03  \ 

o3 

-aE3 

o3 

o3 

o2 

o2 

o2 

E2 

0i 

0i 

0i 

04 

o2 

o2 

o2 

V  Oi 

0i 

0i 

The  fundamental  matrix  is 
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0i 

0i 

0i 
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o2 
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Here  Ei,  02-  are  respectively  unit  and  zero  matrices  of  or- 
ders i  =  1,2,3.  We  now  verify  Pontryagin's  Condition  [4]. 
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In  so  doing  the  fact  should  be  taken  into  account  that  the 
orthogonal  complement  to  the  subspace  Mo  is  the  eight- 
dimensional  space  L  consisting  of  vectors  of  the  form 

(Pi,  Qi,P2,q2,  ~PuQ3,  -P2,?4), 

where  pi ,  p2  are  arbitrary  two-dimensional  and  qi,  i  —  1,4, 
arbitrary  one-dimensional  vectors.  It  is  just  the  subspace 
L  on  which  the  players'  control  resources  are  compared. 

Thus,  Pontryagin's  Condition  [4,6,16]  for  the  problem 
(2.4),  (2.5)  is  " 


U(t)  -  V[t)  +  0  V*  >  0, 


(3-1) 


w 


here 


U(t) 


1  -e 


-at 


a 


■pu,e  atpu 


)  MMI<l}. 


V(t)  = 


l_e-/9< 

 —  av,  e  0tav  )  :  v  =  (v,  0),       <  1 


and  —  is  the  operation  of  geometric  subtraction  or  Min- 
kowski' difference  [4,6] 

X  -Y  =  {z:z  +  Y  CX}  =  DyeY(X  -  y) 

Note  that  U(t)  and  V(t)  are  convex- valued,  compact- 
valued  mappings  acting  from  [0,  oo)  to  2R  . 

The  relationship  (3.1)  at  fixed  t  implies  that  a  six- 
dimensional  vector  d  =  (rfi,  d2),  di  E  -ft3,  exists  such  that 
for  any  v,  \\v\\  <  1,  the  equations  for  u,  \\u\\  <  1, 


-0t 


-av  +  di  — 


1 


■at 


P  a 
e-0t<jv  +  d2  =  e-at 


-pu 


(3.2) 


pu 


are  solvable.  This  occurs,  in  particular,  at  v  =  0.  There- 
fore 

1  -  e~at 

d\  =   pu0}  d2  =  e  atpu0, 


a 

where  Uo,  \\uo\  \  <  1,  is  the  solution  to  system  (3.2)  corre- 
sponding to  v  =  0.  In  other  words,  the  system  of  equations 
for  u,  \\u\\  <  1, 

l-e-0*   _     l-e-at  , 

■av  =   p(u  —  uo) 


e  Ptav  =  e 


a 

-at 


p(u  -  u0) 


has  solution  for  any  v,  \\v\\  <  1.  Consequently  the  equa- 
tions 


1 


,-pt 


1  -  e 


-at 


-av  - 


P  a 


-pu, 


(3.3) 


'av  =  e~azpu,  u  G  Rr,  \\u\\  <  2, 


have  solution  u  for  any  v,  \\v\\  <  1.  Evidently  u  =  Xv, 
where  A  is  a  certain  integer  which  is  positive  for  v  ^  0. 
Let  us  fix  some  v  ^  0.  Then  from  the  second  equation  of 
(3.3)  we  have 

P 

Substituting  obtained  expression  for  A  into  the  first  equa- 
tion of  (3.3)  we  have 


a  +  (0  -  a)e~pt  -  pe~^~a)t  =  0 


(3.4) 


When  a  /  (3  the  left  side  of  the  relationship  (3.4)  may 
vanish  only  at  finite  number  of  points  (has  no  more  than 
two  real  roots).  These  are  just  the  points  at  which  Pon- 
tryagin's Condition  holds  when  p  <  a.  If  p  >  a  this 
condition  fails  and  the  method  does  not  work  because  the 
avoidance  of  meeting  is  then  possible  even  in  the  geometric 
coordinates  [4,25]. 

Thus,  for  the  problem  (2.4),  (2.5)  Pontryagin's  Condi- 
tion fails. 

Nevertheless  the  problem  (2.4),  (2.5)  is  in  principle  solv- 
able in  a  finite  time  under  certain  conditions.  Below  these 
conditions  will  be  given.  For  this  purpose  we  shall  use  the 
following  technique. 

Set  Ax  =  zi  -  z3,  A2  =  z2  -  z4.  Then  from  (2.4),  (2.5) 
we  have 

Ai  =  A2,  A2  =  -az2  +  (3z4  +  pu  —  av 

We  add  to  and  subtract  from  the  right  side  of  second  equa- 
tion the  value  7A2,  where  7  >  0  is  some  integer.  Then 

Ai  =  A2,  A2  =  -tA2  -  (a  -  j)z2  +  (/?  -  j)z4  +  pu  -  av 

(3.5) 

From  formula  Cauchy  for  the  equations  (2.4)  there  follows 
that  integers  ^(.z2)  and  t^(z^)  exist  such  that  for  t  > 


{a-j)z2(t)\\ <  -\a-y\  +e, 
a 


and  for  t  >  t2e(zl) 

ll(/?-T>4(0ll<^l/?-7l  +  £1 

where  e  >  0  is  an  arbitrarily  chosen  small  number. 
Note  that  we  may  set 


(3.6) 


(3.7) 


and 


ln[(|>20||+i)|a-7|]-lne 


a 


ln[(lk°ll  +  f)l/?-7|]-ln£ 
0 


So  far  as  the  term  —(a  —  y)z2  +  (/?  —  7)24  has  bounded 
value  beginning  from  some  instant  of  time,  in  order  to 
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supress  its  influence  with  the  help  of  parameter  u,  we  take 
into  consideration  the  function 


/(7)  =  -|«-7l  +  £l/*-7l 
a  p 


(3.8) 


Let  us  find  its  minimum.  One  can  easily  see  that  it  is 
furnished  by  7  equal  to  either  a  or  /?,  and  therefore 

min/(7)  =  min{^,  ^}-  \a-/3\ 

In  order  that  the  pursuer  has  advantage  in  the  right  side 
of  the  equation  (3.5)  it  is  necessary  that 

p-mm{-,?-}\a-  /3\  >  a 
a  p 

So  far  it  is  assumed  that  ^  >  |  [4]  even  for  the  coincidence 
of  geometric  coordinates  the  above  inequality  is  equivalent 
to  the  following  one 

P-j\<*-P\>  (T> 
which  in  the  turn  is  equivalent  to  the  inequalities 

p  >  a  and  —  >  — ,  when  a  >  /?, 

a  p 

a 

p  >  (2  —  -r)c,  when  a  <  /?. 

r 

Then  for  t  >  max{^(z2), ^1{za)}  the  equations  (3.5)  may 
be  changed  for  the  equations 

Ai  =  A2,  A2  =  -c*A2  +  pu-r),  (3.9) 

where  vector  rj  satisfies  the  equation 

\\ri\\<<T(l  +  ±\a-f3\  +  e), 

in  addition  p  >  <r(l  +  A|a  —  /3\  4-  e). 

It  is  known  [5]  that  the  game  problem  (3.9)  as  the  prob- 
lem for  the  same  type  objects  with  the  goal  Ai  =  0, 
A2  =  0  is  equivalent  to  the  problem  of  control  of  the 
system 

Ai  =  A2,  A2  =  -c*A2  +  C,  (3-10) 

where  <  p  —  &(1  +  \\a  —  /?|),  with  the  goal  of  its 
bringing  to  the  origin. 

It  will  be  shown  below  that  the  problem  (3.10)  has  a 
solution  in  a  finite  time.  Thus,  if 


P>(r(l  +  L\a-I3\), 


(3.11) 


then  the  game  problem  (2.4),  (2.5)  has  a  solution  in  a 
finite  time.  But  this  time  may  be  rather  great  since  the 
estimates  of  the  kind  (3.6),  (3.7)  should  hold. 


Below  we  advance  another  method  of  solution  of  the 
problem  (2.4),  (2.5)  in  lesser  time,  though  this  time,  gener- 
ally speaking,  is  not  optimal.  The  process  of  the  problem 
solution  will  consist  of  three  stages  with  further  holding 
a  trajectory  in  the  set  (2.5).  The  first  stage  refers  to  the 
game  problems  of  pursuit  in  geometric  coordinates,  the 
second  one  is  a  problem  of  control  for  a  system  of  the 
type  (3.10)  on  transition  from  one  point  to  another,  and 
the  third  stage  is  a  pursuit  in  tracks  with  exceeding  veloc- 
ity. At  the  first  and  the  second  stages  one  should  handle 
carefully  the  state  constraint  of  the  pursuer. 

4    The  Problem  of  Pursuit 

Let  system  (2.4)  at  t  =  0  be  in  the  position 

z°  =  (zlzlzlz°)  (4.1) 

We  shall  treat  the  problem  of  pursuit  for  this  system  with 
the  terminal  set 


M*  :  z\  =  z3. 


(4.2) 


The  solution  to  this  system  is  known  and  can  be  ob- 
tained either  in  the  class  of  strobostrophic  strategies  [4,16] 
on  the  base  of  Pontryagin's  First  Direct  Method  or  with 
the  help  of  Extremal  Targetting  Rule  of  Krasovskii  [5]  in 
the  class  of  positional  strategies,  or  in  the  class  of  quasis- 
trategies  on  the  base  of  the  Method  of  Inverse  Minkowski 
Functionals  (Resolving  Functions)  [6,14].  The  guaranteed 
time  of  pursuit  is  the  same  for  all  these  methods  and  ap- 
pears as  optimal.  We  now  dwell  upon  the  last  of  the 
above  mentioned  methods  [6].  Pontryagin's  function  for 
the  problem  (2.4),  (2.5)  has  the  form 


TIT/  x      1  -  e~at   „   *  1 

W(t)  =   PS3  -  - 

a 

l-e~at  l-e'P* 
 p  a 


-pt 


-vS-2  = 


a 


0 


S3  =  L0(t)S3 


where  Si  is  a  unit  sphere  in  space  R{,  and  Pontryagin's 
Condition  holds  if  u>(t)  >  0.  This  is  true  when  [4] 


p  a 

P><T,   ~  > 

a  p 


(4.3) 


what  is  more,  if  at  least  one  of  the  inequalities  is  strict 
then  u(t)  >  0. 
Denote 

1  —  p~at         1  —  p~P% 

at,  z°)  =  z\-  z\  +  x-^z\  -  l—^—zi 

a  p 


Then  the  time  of  the  pursuit  termination  is 

T  =  T(z°)  =  mm{t>0:\\l;(t>zo)\\=  I  u(r)dr}  (4.4) 

Jo 
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When  k  —  £  -  §  >  0  the  equation  in  the  relationship  (4.4) 
has  a  positive  root  for  any  z°  since  at  t  =  0  the  left  side 
is  greater  then  the  right  one  and  as  t  — >  +oo  the  former 
remains  bounded  while  the  latter  grows  linearly  with  the 
coefficient  k. 

In  order  to  find  the  control,  ensuring  the  capture,  we 
shall  introduce  the  resolving  function  of  the  problem  (2.4), 
(4.2)  according  to  the  technique  developed  in  [6,14].  Set 


l-e-X*-') 


a(t,T,z°,v)  = 


+ 


l_c-/»(*-r)  1 
— <7 


P 


0\||2 


+\m,n\\ 


,-P(t-r) 


1  _  e-f(l-r) 


P 


a 


<T2\\V\\2 


1/2 


-2 


Then,  if  £(T,  z°)  ^  0,  then  on  the  active  segment  of  pursuit 
[(M*),  where  t+  is  an  instant  such  that  the  test  function 
[6] 

h(U)  =  l-  I  '  a(T,T,zQ,i(r))dr 
Jo 

vanishes,  we  choose  the  pursuer's  control  in  the  form 


u(t) 


,-a(T-r) 


a 


1  M_c-/»(T-r) 


P 


<tv{t)-  (4.5) 


-a(T,T,z°,i(T))aT,z°) 
On  the  passive  segment  [t+ ,  T]  we  set 


u(t)  = 


l_e-«(T-r)  "J 
~P 


a 


-l 


1  _  e-/?(T-T) 


cr{;(r)  (4.6) 


If,  otherwise,  £(T,  z°)  =  0  then  the  control  of  the  pursuer 
for  r  €  [0,T]  is  chosen  in  the  form  (4.6).  It  corresponds 
to  the  control  by  Pontryagin's  First  Direct  Method  [6,16]. 
This  control  ensures  the  bringing  of  a  trajectory  of  the 
process  (2.4)  to  the  set  (4.2)  at  time  T  or,  in  other  words, 
this  control  guarantees  the  capture  of  the  evader  by  the 
pursuer  at  time  T  beginning  with  the  initial  state  z° . 

We  assume  that  the  game  evolves  under  perfect  infor- 
mation, that  is,  in  the  course  of  the  game  an  information 
on  current  states  of  the  players  is  available  to  both  players. 

Note  the  following  circumstance.  At  the  time  of  the 
evader's  capture  the  velocity  of  the  pursuer  X2(t)  may  form 
obtuse  angle  with  the  normal  to  the  horizontal  plane.  This 
fact  would  point  to  the  further  intersection  of  the  horizon- 
tal plane  by  the  pursuer,  that  is,  to  the  violation  of  the 


state  constraints.  This  is  the  reason  why  the  process  of 
pursuit  is  ceased  at  certain  instant  of  time  r* ,  r*  <  T. 

Below  we  describe  how  this  instant  is  chosen.  Denote 
by  D(x(T*),t)  the  attainable  set  in  geometric  coordinates 
of  the  system  (2.1)  at  time  t  from  the  point  x(r*).  Then 


1  -  e~at 

D(x(r»),<)  =  x(r+)  +  x(r. 


a 


Jr.  a 


-pdrS3 


The  integral  of  this  set-valued  mapping  is  defined  in  a 
standard  way  [20]  as  a  union  of  the  integrals  of  measurable 
selections  of  the  mapping.  In  our  case  the  integral  of  the 
sphere  of  variable  radius  is  the  new  sphere  of  the  radius 
equal  to  the  integral  of  the  initial  sphere  radius  [6,14].  Set 

n  =  max  {D(x(r),t)  n  {x3  >  0}  /  0  V<  >  r}, 

re[0,T] 

where  {x^  >  0}  is  the  upper  halfspace  (above  the  horizon- 
tal plane). 

If  a  motion  of  the  pursuer  is  to  evolve  at  distance  6  > 
0  above  the  horizontal  plane  then  for  the  evaluation  of 
instant  r,  the  halfspace  {£3  >  0}  should  be  changed  for 
the  halfspace  {#3  >  6}. 

Evidently,  r*  is  just  the  last  instant  of  time  when  the 
pursuer  may  avoid  unwanted  meeting  with  the  horizontal 
plane.  He  is  allowed  only  to  touch  it,  not  intersecting. 

Thus  the  process  of  pursuit  will  be  performed  only  on 
the  interval  [0,  r*]. 

5    The  Problem  on  System  Transi- 
tion from  One  Point  to  Another 

Let  us  fix  the  instant  r*  and  the  corresponding  states  of 
the  players  x(r+)  and  y(r*).  We  shall  analyse  the  problem 
of  optimal  control  of  system  (2.1)  on  the  fastest  transition 
from  the  state  x(r*)  into  the  state  y(r»),  y(r*)(l  +  e), 
0  <  e  <  1.  Thus,  the  problem  is  to  achieve  in  the  shortest 
time  the  concurrent  coincidence  of  the  players'  geometric 
coordinates  and  velocities'  directions  under  the  advantage 
of  the  pursuer  in  the  velocity's  magnitude.  It  goes  without 
saying  that  in  so  doing  the  state  constraints  should  not  be 
violated. 

The  integer  e  accounts  for  the  magnitude  of  wanted  ad- 
vantage of  the  pursuer  over  the  evader  in  velocity  at  point 
y(r*).  By  virtue  of  Cauchy  formula  the  attainable  set  of 
the  system  (2.1)  in  six-dimensional  space  has  the  form 


D(x0)<)  =  $(<)xo  + 


$(r) 


0 

pu 


1 


l-e 


u||  <  ljrfr, 
(5.1) 

aat     J  ®  ^3  (®  is  Kronecker  sym- 


where  <!>(<) 

i  I  e 

bol)  is  a  fundamental  matrix  of  the  homogeneous  system, 
and  xq  =  (z^z®)  is  the  initial  phase  state.    This  set 


447 


is  convex  and  closed.  Therefore  the  fact  that  the  point  r(^*)  =  9  +  t*,  r(t*)  =  1, 
x*  =             =  (y(T#),i/(r*)(l  +  e))  lies  in  the  set  (5.1) 

can  be  expressed  in  terms  of  support  functions  in  the  fol-  where  <*  =    ^  e  *  ■  Then,  beginning  from  the  instant  9 

lowing  way:  at  each  instant  of  time  9  +  t,  t  >  0,  the  pursuer  will  use 

the  information  on  x(9  +  t),  y(r(t)),  and  v(r(t))  striving 

(       *\  .  /       *\  <  (               —  e~at  °\  _l      ft;  o\  *°  Perf°rm  the  soft  meeting  with  the  evader  by  moving 

[Pi,*i)  +  [P2,z2)  S             +       -      z2J  +      \p.2)  hig  trackg>  that  ^  by  holding  the  relationship 

i       -at  o\      f*            fl-e~aT          _aT       \  s(0  +  f)  =  y(r(f)),  <>  0. 

+  (p2,e  at22° +  /  p  max   Pl  +  e-aTp2,  u   dr  V       ;     yK  K  Jh  ~ 

Jo    IMI<i  \      <*  / 

From  the  wanted  relationship  (6.3)  there  follow  the  rela- 

for  allp=(p1)P2)Gie6,  ||p||  =  l-  tionsMps 

Above  the  integral  and  the  maxima  are  interchanged  on  .  . .      .  _       ,   .  .  . 
the  basis  of  Lyapunov's  theorem  on  vector  measures  [20]. 

Thus,  the  issue  on  the  possibility  and  the  duration  of  ^  +  t)  =  jj(T(t))f(t)  +  §(r(t))i(t)  (6.5) 
the  transition  of  the  system  (2.1)  from  point  xq  to  point 

x*  reduces  to  the  issue  on  the  existence  of  a  positive  root  These  relationships  are  equivalent  to  the  inclusion 
of  the  following  equation  for  t 


(6.3) 


mm 
IIpII  =  i 


L  ,  z\  -  zl  +  ^—f^'S)  +  (P2,e~atz°2  -  ,*)  + 

/ 

JO 


(x(e  +  t),y(T(t)))  eMr(t)-i,  t>o, 


+  /  p 


where 

(5-3)  M£  =  {x  =  y,  x  =  jj(I  +  e)}. 

'  ^   _i_  —  OCT       J    A 

a  ~  From  system  (2.1),  taking  account  of  the  relationships 

(6-4),  (6.5),  and  (2.2),  we  deduce  the  explicit  form  for 
Such  root  exists.   Denote  the  least  of  them  by  p.  To  ^  ,        ,  i 

J  the  pursuer  s  control 

it  corresponds  vector  p*  =  (p*,p2)  which  furnishes  maxi- 
mum in  (5. 3).  a  ..  \. 

In  accordance  with  the  Maximum  Principle  of  Pontrya-      u(d  +  *)  =  ~  ^)£(r(0)  +  -y(T(t))f{t)+  (6.6) 

gin  the  program  control  of  the  pursuer  satisfies  the  rela- 
tionship a  a 


(p(t),u(t))=  max  (?(<),«),  t  e[0,e],  (5.4) 


+        f(t)v(r(t)),  t£[0,t*}. 
P  P 


w 

system 


From  the  equations  (6.1),  (6.2)  with  account  of  (6.3),  (6.4) 

i        ~      tc\    \       a    n\  •         i  j  ■       f  , ,  we  obtain  the  equations 

here  u  =  (0,'u),  and  p(t)  is  a  solution  oi  the  conjugate  M 

x(^  +  r)  =  y(r(r))  (6.7) 
PW=(-1    a  )®E3p{t),  p(e)=p*.  x(9  +  t*)  =  y(r(t*)). 

Thus,  the  soft  landing  is  performed. 

6    Pursuit  in  Tracks 

Thus,  right  to  the  time  6  the  following  relationships  are     'J      The   Problem  Oil   Holding   of  a 

true 


x 


(9)  =  y(n),  x{9)  =  £(r*)(l  +  e),  n  <  9. 


Trajectory 


Performing  a  soft  landing  sometimes  it  is  necessary  that 
Set  the  equalities  in  geometric  coordinates  and  velocities  hold 

2  during  some  period  of  time. 

r{t)  =  t*  +  (1  +  e)t  12 ,  t  >  0.         (6.1)        We  shall  follow  the  line  of  reasoning  of  the  previous  sec- 

r*)  tion  and  choose  the  control  of  the  pursuer  from  the  condi- 

One  can  easily  verify  that  tion  of  the  coincidence  of  current  velocities  analogously  to 

(6.6).  Then  we  shall  obtain  the  control  which  will  ensure 
r(0)  =  r#,  f(0)  =  1  +e,  (6-2)    the  holding  of  a  trajectory  in  set  Mo  for  t  >  9  +  t* .  The 

constraints  on  the  process  parameters  imposed  in  Section 
r(<)  <  9  +  t,  0  <  t  <  t* ,  3  make  such  choice  possible. 
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8  Conclusion 

Thus,  the  sufficient  conditions  for  solvability  of  the  game 
problem  on  soft  landing  (2.1),  (2.2),  (2.5)  in  a  finite  time 
beginning  with  any  initial  state  are  derived.  The  process 
of  pursuit  is  partitioned  into  three  stages.  On  the  first 
stage  the  pursuit  in  geometric  coordinates  is  performed 
on  the  active  and  passive  segments  and  the  control  is  cho- 
sen in  accordance  with  formulas  (4.5),  (4.6).  In  the  case 
of  availability  of  information  only  on  a  current  position 
the  same  goal  may  be  attained  by  the  methods  contained 
in  [5,19,25].  On  the  second  stage  the  typical  problem  on 
transition  from  one  point  to  another  is  solved.  The  control 
is  chosen  from  the  relationship  (5.4).  This  process  may  be 
realized  on  the  base  of  Pontryagin's  Maximum  Principle. 
Finally,  the  last  stage  is  a  pursuit  in  tracks.  The  con- 
trol ensuring  the  soft  landing  is  given  by  expression  (6.6). 
What  is  more,  the  proximity  of  geometric  coordinates  and 
velocities  may  be  maintained  during  as  great  as  possible 
period  of  time. 
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Abstract 

An  approach  to  treat  linear  differential  games 
with  information  time  lag  appearing  as  a  function  of 
time  is  presented  here.  This  approach  consists  in 
reduction  of  linear  differential  game  with 
information  time  lag  to  equivalent  one  with  perfect 
information.  It  makes  possible  using  all  methods  and 
approaches  developed  for  perfect  information 
differential  games  to  solve  linear  differential  games 
with  variable  information  time  lag. 

To  apply  the  method  connected  with  time  of  the 
"first  absorption"  to  the  reduced  game  it  was 
extended  to  nonstationary  case.  Sufficient  conditions 
on  the  game  parameters  are  derived  to  construct  the 
control  bringing  a  trajectory  of  the  game  with 
information  time  lag  to  the  terminal  set  in  the  "first 
absorption"  time. 

Keywords:  time  lag,  terminal  set,  time  of  "first 
absorption",  pursuer,  evader. 

1.  Introduction 

R.  Isaacs  in  his  monograph  [1],  dedicated  to  the 
theory  of  differential  games,  imphasized  an 
importance  of  problem  statements  in  which  one  of 
the  players  has  a  time  lag  on  the  availability  of  the 
opposer's  state  vector.  In  fact,  such  models  most 
accurately  describe  physical  dynamic  situations  of 
conflict.  There  exists  a  number  of  methods  for 
solution  of  differential  games  of  pursuit  [2-4]  but  all 
of  them  work  under  the  assumption  that  the  pursuer 
has  perfect  information  about  a  current  position  of 
his  adversary.  However,  in  practice  such  information 
may  be  available  only  with  some  delay  in  time, 
needed  for  example  for  processing  the  incoming 
data. 

M.  Giletti  was  first,  who  provided  a  rigorous 
formulation  of  a  broad  class  of  two-person  zero-sum 
differential  games  with  imperfect  information  in  the 
form  of  a  constant  time  lag  [5,6].  He  extended  so- 
called  Hamilton-Jacobi  theory  of  optimal  control  and 


main  equation  analysis,  developed  by  Isaacs,  to  the 
case  of  payoff  separable  with  respect  to  opposer's 
control  parameter.  In  the  case  of  separated  motions 
of  the  players  the  differential  game  of  pursuit  with 
constant  information  time  lag  immediately  reduces 
to  the  perfect  information  differential  game  [7]. 

In  [8-10]  the  author  suggested  an  approach  to 
treat  general  linear  differential  games  with  constant 
time  lag,  later  extended  to  the  case  of  variable  time 
lag  [11-131,  which  made  possible  using  all  methods 
developed  for  perfect  information  games  [2-4].  In 
this  paper  we  adress  the  linear  differential  games 
with  information  time  lag  which  appears  as  a 
function  of  time.  Herein  we  outline  an  approach  for 
studing  this  class  of  differential  games  consisting  in 
reduction  of  the  game  to  equivalent  one  with  perfect 
information  and  further  extension  of  the  known 
method  [4]  to  nonstationary  case. 

2.  Equivalent  Game 

Let  the  motion  of  conflict-controlled  system  be 
snbject  to  tile  linear  differential  equation 

z  =  Az  +  u  +  v,  (1) 

where  z  e  fl",  A  is  a  matrix,  ue  U,  v  e  V,  U  c  R", 
Vc/T.  The  terminal  set, 

M,  M  c  fC,  (2) 

is  given.  Two  players  (the  pursuer  and  the  evader) 
choose  controls  u  and  v  in  competition.  The  goal  of 
the  pursuer  is  in  the  shortest  time  to  drive  a 
trajectory  of  the  system  to  the  terminal  set  M  by 
means  of  his  control  choice.  The  goal  of  the  evader 
is  the  opposite. 

As  admissible  controls  of  the  players  serve 
measurable  functions  with  values  in  U  and  V 
respectively.  Denote  the  sets  of  admissible  controls 
by  Qt,  and  Qv. 

We  shall  analyze  the  development  of  game  from 
the  pursuer's  point  of  view.  The  crux  of  the  problem 


facing  the  pursuer  is  the  time  lag  on  the  availability 
of  a  current  state  vector.  This  time  lag  appears  as  a 
function  of  time  x(t),  defined  on  a  half-closed 
interval  [0,  +°°),  where  x(0)  =to,t()>0.  x(t)  is 
assumed  to  be  nonnegative  smooth  function  with 

reasonable  restriction  on  its  growth  rate:  T(t)  <  1. 
The  last  condition  provides  current  information 
about  the  system  development  to  be  continuously 
updated.  On  the  initial  time  interval  [-x0,0)  the 
pursuer  applies  an  arbitrary  admissible  control.  At 
instant  t  =  0  he  comes  to  know  the  initial  state  of  the 
system,  and  besides  he  remembers  own  control  on  [- 
t0,  0),  denoted  by  u"  (  •).  The  pair  (z°,  u  (•))  stands 
for  the  initial  position  of  the  game.  By  position  of  the 
game  at  instant  t  is  meant  a  pair  (z(t  -rfr)),  «'(•))> 
where  «'(•)  is  realization  of  the  pursuer's  control  on 
the  half-closed  interval  [t  -  r(t),t).  Let  Z(t)  be  a  set  of 
all  points  attainable  at  instant  t  from  the  point  z(t- 
r(0)  through  «'(•)  and  various  admissible  controls  of 
the  evader  on  [t  -  T(t),t).  In  view  of  formula  Cauchy 
we  have 

Z(t)=2T(t)+V(T(t)  )  ,  (3) 

where 

z(t)=er(t)Az(t-T(t)  )+ 

t  (4) 
+  \eit~6)Aut(e)d6l 

t-T(t) 

T(t) 

V(T(t)  )=  je0AVdO  (5) 

o 

We  shall  say  that  the  differential  game  of  pursuit 
with  infotmation  time  lag  can  be  terminated  at  time 
instant  t,  t  >  0,  if  there  exists  a  strategy  of  the 
pursuer,  providing  the  inclusion  Z(t)  c  M  for  any 
control  of  the  evader.  This  inclusion  can  be  rewritten 
with  regard  to  (3)-(5)  in  the  form 

z(t)  e  M(T(t)),  (6) 

where 

M(  r(t))=M  -  V((  r(t))  (7) 

Above  the  operation  of  geometric  subtraction  of 
sets  [2]  was  used: 

X  -  Y={z:z+  YdX/,X,Ycz  ET 
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It  is  easily  verified  that  on  [0,°°)  vector  z(t ) 
satisfies  the  differential  equation 

z  (t)=A  z(t)+u(t)+ 
+(1-  r\t))e^\(t-  r(t))  (8) 
with  the  initial  condition 

0 

z(0)=eT°Az0+je-eAuT°  (6)dd 

Reference  to  (4)  shows  that  the  pursuer  has 
perfect  information  about  current  value  of  z(t). 

Let  us  consider  the  perfect  information  differential 
game,  described,  by  the  differential  equation  (8)  and 
the  terminal  set  M(  r(t))  (7).  Above  reasoning  led  to 
the  following  assertion. 

Theorem  1.  Let  set  M(z(t))  be  nonempty  for  t  e 
[0,°°).  The  linear  differential  game  (1),  (2)  with 
information  time  lag  t( t)  can  be  terminated  at  time  T 
from  initial  position 

(Zq,  u  (•))  if  and  only  if  the  corresponding  perfect 
information  differential  game  (8)  with  terminal  set 
M{  r(t))  can  be  terminated  at  the  same,  time  T. 

From  this  theorem  it  follows  that  all  methods  and 
approaches,  developed  for  differential  games  with 
perfect  information,  can  be  applied  to  solve  linear 
differential  games  with  variable  information  time 
lag. 

3.  Time  of  "First  Absorption" 

To  apply  the  method,  connected  with  the  "first 
absorption"  time  [3,  4],  for  solution  of  the  perfect 
information  differential  game  (7),  (8)  we  need  an 
extension  of  this  method  to  nonstationary  case. 
Below  we  provide  conditions  on  parameters  and 
initial  position  of  the  linear  nonstationary  differential 
game  with  perfect  information  which  are  sufficient 
for  bringing  a  trajectory  of  the  game  to  some  time 
dependent  terminal  set  in  the  "first  absorption"  time. 
These  conditions  are  an  extension,  of  well  known 
result  [4]. 

Let  the  motion  of  a  conflict-controlled  system  be 
described  by  the  following  equation 

z(  t  )=A  ( t )  z+B  ( t )  u+C  ( t  )V  (9) 

Here  z  e  R" ,  u  e  U,  v  e  V,  U  and  V  are  convex 
compacts,  U  c  Rr ,  V  c  R5 ,  A(t),  B(t),  and  C(t)  are 
matrices  of  order  n  x  n,  n  x  r,  and  nx  s,  respectively, 
with  their  elements  to  be  time  continuous  functions. 


The  goal  of  the  pursuer  is,  in  the  shortest  time  to 
provide  the  inclusion  z(t)  €  M(t),  where  M(t)  is 
nonempty,  convex  set,  closed  for  all  t  e  [t0,°°). 

It  is  assumed  that  the  support  function  of  set  M(t) 
w        =  sup  (y,x) 


XEM(t) 


in  continuous  in  \j/  on  set 


Kw  =  (v:V€jr,Wiw.W<  +  -} 


and  convex  cone  KM(lj  is  closed  at  t  e  [f0,°°). 

Denote  the  pair  (t,z(t))  by  position  of  the  game  at 
time  instant  t.  Let  the  initial  position  (t0  z0)  be  given. 
We  shall  say  that  this  game  with  perfect  information 
can  be  terminated  in  time  T(t0  z0)  from  the  initial 
position  (t0  z0)  if  there  exists  a  strategy  of  the  pursuer, 
bringing  a  trajectory  of  the  process  to  the  terminal 
set  at  some  time  instant  t;,  t;  <  T(t0  z0),  that  is 


Set 


z(t,)  g  M(tJ 
W(\j/,t,  z,s)= 

s 

=  (6*  (s,t)\i/,z)+\max(d*  (s, 0)W,  B(0)u)dO+ 

J  ueU 

t 

s 

+  [min(6*  (s,e)\i/,C(e)V)de,  (10) 

t 

X(t,z/s)=inin[W(y/,t/s,z)+WMIS)(-yr)  J,  (II) 


A(t,  s,z)={  y/:\\i/4=l 
W(y/,t,z,s)+WM(s)(-y/)=A(t,z,s) }  (12) 

Here  6(s,t)  is  a  matrizant  of  homoge-neous 

system  z=  A(t)z,  A(t,  s,  z)c  -  KMfs)  .  If  z(t)  €  M(t) 
then  from  the  definition  of  support  function  it 
follows  that 

X(t,z(t),t)=  min  [fv,  z(t))+V/MW(-\v  )]<  0 

m=1 

Denote  by  T(r,  z)  the  least  root  of  the  equation  for 
s  X(t,  s,  z)  =  0,  which  is  more  or  equal  t.  If  there  is 
no  such  root  we  set  T(t,  z)=+°°.  Evidently  T(t,z)=  t 
only  for  z(f)  e  M(r).  Unlike  stationary  case,  where 
by  T(z)  is  meant  the  time  which  is  left  before 
absorption  of  the  terminal  set  by  the  attainable  set,  in 
nonstationary  case  T(t,  z)  is  the  time  instant  of  this 
absorption  exactly. 

Theorem  2.  Let  for  any  position  (tltz,)  such  that 
T(f7,z;)  <  +°°  the  following  conditions  hold 


a)  set  A(t,  z,  T(?;  ,z;))  consists  of  a  unique  vector 
y(t,  z,  T(tt  ,Zj))  for  all  (t,  z)  from  the  neighborhood  {ti 

,z,); 

h)for  all  points  from  this  neighborhood 


max(0*(T(t1,z1),t)y(t,z,  T{tPz)),  B(t)u) 


is  furnished,  by  unique  vector  u{t,z,T{t,  ,z)). 

Then  the  pursuer  can  terminate  the  game  in  time 
T(tffz0). 

Proof.  Let  initial  position  (tffz0)  be  given  and  T(tg,z0) 
<  +<».  By  assumptions  of  the  theorem  there  exists  a 
neighborhood  of  the  point  (^z0),  where  vector 
\\r(t,z,T0)  T=  T(^z0),  is  uniquely  defined.  Let  us 
denote  this  neighborhood  by  Q(^z0).  Since  function 
W(\|/,  t,  z,  T0)  +  WM(To)(-y)  is  continuous  in  t,  z,  and 

\|/  then  function  \|/(f,  z,  T0)  is  continuous  in  t,  z  and 
the'  control  u°(t,  z,  T0)  furnishing  maximum  to 
expression 

(6*(  T0t)\f{t,  z,  T0),  B(t)u), 

is  continuous  in  t,  z.  We  assign  the  pursuer's  control 
inside  £2(  t0  z0)  to  be  of  the  form  u  =  =u°(t,  z.  T0). 
Consider  the  equation 


z=A(t)z+B(t)u°  (t,z,T0)+C(t)V(t)  , 
z(t0)=z0 


(13) 


For  any  control  v(t)  of  the  evader  this  equation 
has  a  solution  which  can  be  extended  up  to  the 
boundary  of  Q(  t0  z0).  Thus,  a  trajectory  of  (13)  is 
constructed  on  some  time  interval  [t0  We  now 
prove  the  following  inequality 


T(tI/z£7ti;  )<T(t0,z0) 


To  do  this  it  will  suffice  to  show  that 


A(t,z°(t)  ,T(t0,z0)  )>0, 


t0<t<tx. 


(14) 


(15) 


So  far  as  A(t,z,T0)  consists  of  a  unique  vector 
\\f(t,z,T0)  inside  Q(  t0  z0)  then  X(t,z,T0)  has  continuous 
partial  derivatives  in  /  and  z  there.  We  now  evaluate 
the  time  derivative  of  X(t,T0,z°(f)),  taking  into 
account  the  pursuer's  control  choice  on  [t0  ,/J. 
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dX(t,z°(t)  ,T0)_ 
dt 

-(A*  (t)lff(t,z°(t)  ,T0)  ,6(T0,t)z°(t)  )- 


■max(6*  (T0,t)y/(t,z°(t)  ,  T0)  ,  B(t)u)- 


■min(6*  (T0,t)y/(t,z°  (t)  ,T0)  ,C(t)V)+ 

veV 


+  (6*(T0,t)y/(t,z°(t)  ,T0)  ,A(t)z°(t)+ 
+B(t)u°(t,z°(t)  ,T0)+C (t)V(t)  ) 


Theorem  3.  Let  the  assumptions  of  Theorem  2  hold 
for  the  game  (9),  (8).  Then  the  linear  differential 
game  (1),  (2)  with  time  lag  x(t)  can  he  terminated 
from  the,  initial  position  (z°,u°(  ■))  in  time  T/0,  z0) 

4.  Example 

Below  one  model  game  [1]  complicated  by 
variable  information  time  lag  on  the  availability  of 
an  object  state  vector  is  solved  to  illustrate  obtained 
result. 

Let  the  motion  of  object  be  subject  to  the  system 
of  equations 


It  is  seen  that  the  above  expression  is  nonnegative  on 
[f0,fj,  that  is  function  X(t,z°(t), T0)  does  not  increase 
along  z°(t), 

te  [tfft,].  So  far  as  X(t0  ,zffT0)  >0.  Formula  (15)  is 
true.  In  a  similar  way,  beginning  with  the  position  (tr 
z°(t,))  we  shall  construct  a  solution  of  (13)  on  some 
interval  [t;  ,t2]  such  that 

T(t2,z°(t2)  )<T(tllZ0  (t,)  ) 

It  is  easy  to  see  that  the  construction  of  trajectory 
can  be  prolonged  until  T(t,  z°(t))  will  vanish.  By 
virtue  of  the  inequality  (14)  which  holds  true  in  the 
course  of  the  pursuit  it  will  occur  no  later  than  at 
time  T  (t0,  z0). )  Proved. 

Substituting  into  expressions  given  by  formulas 
(10)-(12)  the  parameters  of  the  equivalent  game  (7), 
(8) 


TltlA 


0(s,t)=e's-c'A,B(t)=I,C(t)=(l-T(t)  )e 

o 

z°=er°Az0+\e-eAu°(e)ad 

-To 

we  denote  them  by  W/\|/,r,5,  z ),  Xfas,  z ),  A(t,s,  z ) 
respectively. 

Let  T^t,  z ),  be  the  least  root  of  the  equation  for  s 
Xfas^j^O 

which  is  more  or  equal  t. 

We  suppose  that  for  all  te  [0,<»)  the  support 
function  of  set  M(x(t)),  namely,  WMftfw(v|/),  is 
continuous  on  set  KMW))=  (V-'fe  R"  Www„(\|/)  < 
+«>}  and  set  KMfW)  is  closed.  Clearly  Theorem  2  holds 
true  when  z  is  substituted  for  z.  We  came  to  the 
following  result  in  view  of  Theorems  1,  2. 


zi=z2,z  2=u,Z3=V,  (16) 
where   z  .eRn ,  i=l,  J/||u||<l,||v||<2/n>2 

We  consider  the  game  terminated  as  soon  as 
1^-23 1|<£  (e-capture  in  geometric  coordinates). 

Let  (z°,  z°,  z°,  u°(  ■)  )  be  the  initial  position  of  the 
game.  Function  x(t),  x(t)  e  C;[0,°°),  x(t)  <  e,  x(0)  =x0 
is  given.  Here 


M=fz;||z1-z3|<£;  (17) 

Then 

¥2=0,¥;e  R",i=l,2,3} 

and  on  KM  WM  CV)  =  e||\|/J.  Performing  calculatios 
we  obtain 


W(y/,  t,z°)=(y/1,z01+tz°-z03)+ 

t 

\max(Oy/1,u)d0+ 


\min[  (l-T'(O) )  (-y/ltv)]dd 

0 


We  observe  that  V=v=-. 


furnishes  maximum 


and  minimum  in  the  above  expression.  Therefore 
W (ys,  t,  z°)=(y/lf  z°+tz°-z°3)+ 


\¥i\\(— -r0-t+T(t) ) , 

4j 
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and  for  \|/  e  KM 

^W=lkJI(e-T(t)), 
X(t,z0)=min[  (y/lf  z°1+tz°2-z°)+ 

t2 /2-t-T0+£] 


Minimum  in  above  expression  is  furnished  by  the 
vector 

z°+tz°-z? 


provided  ||z^+t^-zj*a  Then  A(t,  z  0)  consists  of  a 
unique  vector 

\j/(t,z° WyfJtrZ-0 )  .O-y^t,*0) )  ,|M|=V2, 

and  T(0,  z  0)=Tl((z°,u°(  ■))  can  be  found  from  the 
equation 


z2+tz2-z3 


=t2  /2-t+£-T0,  (18) 


where 

0 

zo1=zo1+Toz°2+jeuo(0)de/ 

0 

z°2=z°2+  \eu°(d)dd,  z°3=z 


In  order  that  the  condition  a)  of  Theorem  3(2)  be 
satisfied  it  suffices  that  the  inequality- 


t72-T0-t  +  e>0 


holds  for  t  >  0.  In  the  turn,  the  latter  holds  when  8  > 
T0+l/2.  In  these  conditions  u  =\|//£,  z)  is  uniquely 
defined,  that  is  the  condition  b)  of  Theorem  3(2)  is 
satisfied.  From  Theorems  1,  3  there  follows  that 
when  e  >  t0+1/2  the  game  (16),  (17)  can  be 
terminated  from  the  initial  position  (z,°,  z°,  z°,  u°(  ■) ) 
in  time  T(z°,  u°(  ■))  defined  as  the  first  positive  root 
of  the  equation  (18). 
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ABSTRACT 

Traditional  game  theory  is  a  normative  science  and  is  not 
meant  for  modeling  the  real  behavior  of  players.  This  paper  describes 
a  method  the  goal  of  which  is  to  predict  the  choices  of  players  in  real 
situations  rather  than  to  compute  optimal  decisions.  It  is  assumed  that 
each  player  faces  a  choice  between  two  strategies:  active  and  passive. 
The  method  is  based  on  structural  representation  of  a  subject  together 
with  his  images  of  the  self  and  another.  This  representation  allows  us 
to  compose  systems  of  equations  whose  solutions  are  the  probabilities 
with  which  the  players  choose  the  alternative  strategies. 

KEYWORDS:  reflexive  games,  decision  making,  dynamic 
systems 

1.  THE  TRADITIONAL  APPROACH  TO 
GAMES 

Game  Theory  allows  us  to  construct  models  of  the  ideal 
behavior  of  players  seeking  to  guarantee  themselves  optimum 
outcome.  At  its  birth  game  theory  seemed  to  be  able  to  serve  not 
only  as  a  basis  for  rational  decision  making  but  also  as  a  tool  for 
modeling  the  real  behavior  of  players.  This  approach  looked 
promising  at  first  because  laboratory  experiments  showed  that  in 
certain  cases  game  theory  can  predict  players'  choices.  Most 
attempts  to  use  game  theory  for  modeling  real  strategic  behavior 
with  the  purpose  of  its  prediction  have  failed,  however,  and  are 
now  no  longer  undertaken.  The  reason  for  this  negative  result  is 
that  the  real  decisions  in  strategic  situations  cannot  be  explained 
only  by  the  players'  desire  to  guarantee  optimum  outcome. 
Today  we  still  face  the  necessity  of  creating  a  general  game 
theory  capable  of  predicting  human  choice  under  conditions  of 
the  conflict  of  interests. 

2.  AN  ALTERNATIVE  APPROACH 

In  real  games  players  make  decisions  based  on  images 
of  themselves  and  of  their  partners.  These  images  are  reflexive: 
a  subject  can  "see"  himself  and  others,  together  with  the  images 
they  have.  A  simple  reflexive  structure  is  represented  in  Fig.  1. 
Rectangle  a,  represents  a  player  who  performs  an  action  X,; 
rectangle  a2  is  his  image  of  the  self  and  y2  his  image  of  his 
partner.  Symbol  *  is  the  image  of  their  relationship,  and 
rectangle  x3  is  the  subject's  knowledge  about  his  image  of 
himself.  Speaking  in  psychological  language,  in  this  rectangle 
there  appear  the  player's  conscious  plans  and  intentions. 


Figure  1 .  The  subject's  reflexive  structure 

In  1966  I  put  forth  the  hypothesis  that  an  alternative 
game  theory  could  be  based  on  schemes  similar  to  that  in  Fig.l 
[1].  At  that  time  it  was  only  an  idea.  To  construct  a  theory  of 
"reflexive  games"  one  needs  to  know  the  functions  which 
connect  the  rectangles  in  Fig.l.  These  functions  must  be 
universal  in  relation  to  various  human  activities,  that  is,  they 
must  reflect  the  psycho-anthropological  fundamentals  of  human 
choice.  In  my  subsequent  work  [2,3]  I  argue  that  these  basic 
functions  are  the  following: 

y  def 

z,=x  ==  1-y+yx,  (1) 

def 

z2=x»y  ==  xy,  (2) 

def 

z3=x©y  ==  x+y-xy,  (3) 

where  x,ye[0,l]. 

The  value  of  each  variable  is  interpreted  as  a  "measure 
of  positivity."  Function  z,  describes  the  correlation  between  a 
player's  reflexion  and  his  state:  the  value  of  z,  is  the  measure  of 
the  player's  positivity  when  the  world  influences  him  with  the 
measure  x  and  he  "sees"  a  situation  with  the  measure  y.  Symbols 
•  and  ©  correspond  to  relationships  between  the  players  which 
can  be  only  of  two  types:  strong  (•)  or  weak  (©).  The  values  of 
z2  and  z3  are  the  measures  of  positivity  of  situations  when 
players  with  the  measures  x  and  y  are  involved. 

Now  we  can  begin  developing  the  theory  of  reflexive 

games. 
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3.  THE  REPRESENTATION  OF  A  PLAYER 


4.  SAMPLE  APPLICATION 


We  assume  that  every  player  faces  a  choice  between 
two  strategies:  an  active  strategy  and  a  passive  one.  An  active 
strategy  is  a  positive  pole,  a  passive  strategy  is  a  negative  pole. 
The  relationship  between  players  is  described  by  the  operation 


def 


x  p  y  ==  p(x-y)+(l-p)(x®y), 


(4) 


where  pe[0,l]. 

This  operation  is  a  generalization  of  operations  (2)  and 
(3).  Now  the  reflexive  structure  in  Fig.l  can  be  assigned 
functional  meaning  and  written  as  the  following  expression: 


02  P  y2 


=X, 


(5) 


where  a„a2,x3,y2,pe[0,l  ]. 

It  follows  from  (5)  that  X,e[0,l].  The  value  of  a,  is 
interpreted  as  the  measure  of  the  external  world's  pressure 
toward  an  active  strategy  on  the  non-conscious  level;  the  value 
of  a2  is  the  measure  of  pressure  of  which  the  player  is  aware.  The 
value  of  y2  is  interpreted  as  the  measure  of  an  opponent's 
readiness  to  choose  an  active  strategy,  from  the  player's  point  of 
view.  The  value  of  p  is  the  player's  subjective  estimate 
concerning  the  extent  of  his  involvement  in  a  relationship  with 
an  opponent.  In  a  situation  of  conflict  the  value  of  p  can  be 
interpreted  as  the  intensity  of  confrontation.  The  value  of  x3  is 
the  player's  plan  or  intention,  and  the  value  of  X,  is  the  objective 
readiness  of  the  player  to  use  the  active  strategy.  This  value  can 
be  interpreted  as  the  probability  of  the  player's  choosing  an 
active  strategy.  Therefore,  X,  corresponds  to  the  real  choice  and 
x3  to  the  player's  subjective  intention. 

Further  we  will  consider  a  player  who  performs 
conscious  actions,  that  is,  he  uses  only  such  plans  x3  for  which 
the  following  equation  holds: 


X3-X,. 


(6) 


With  the  condition  (6),  expression  (5)  turns  into 
equation  in  relation  to  x: 


Let  two  companies,  A  and  B,  be  in  conflict.  A  blames 
B  for  failing  to  fulfill  its  obligations.  Company  A  can  either  use 
an  active  strategy,  that  is,  to  sue  the  other  firm,  or  a  passive  one, 
refrain  from  suing.  Company  B  also  has  two  strategies:  active, 
to  file  a  counter-claim,  and  passive,  to  refrain  from  filing. 

Further  we  will  consider  the  situation  from  B's  point  of 
view.  Let  B  not  want  A  to  file  a  suit.  A  consultant  hired  by  B 
advises  to  threaten  A  with  a  counter-claim.  To  do  so,  B  must 
publicly  dramatize  the  situation,  using  the  mass  media.  But  these 
actions  are  expensive.  B's  executives  face  a  dilemma:  to  accept 
the  consultant's  advice  or  to  refuse  it.  The  theory  of  reflexive 
games  can  help  B.  The  analysts  from  company  B  have  to 
construct  a  model  of  their  opponent  based  on  equation  (7).  Let 
it  be  known  that  A's  lawyers  are  trying  to  push  the  executives  to 
start  the  law  suit,  so  B's  analysts  assume 


a2>0. 


(8) 


Since  there  are  no  signs  that  A's  lawyers  can  covertly  influence 
the  executive's  subconsciousness, 


a,-a2. 


(9) 


The  value  of  p  is  the  degree  of  dramatization  of  a  conflict.  When 
p=l,  dramatization  is  maximal;  when  p=0,  it  is  minimal.  The 
value  of  y2  in  equation  (7)  is  B's  readiness,  from  A's  point  of 
view,  to  file  a  counter-claim.  The  more  B  dramatizes  the 
situation  and  threatens  A  with  a  counter-claim,  the  greater  y2.  To 
reflect  this  in  the  model,  B's  analysts  must  choose  the 
monotonically  increasing  function  y2=<p(p).  The  simplest 
function  of  this  type  is 


y2=p- 


(10) 


In  solving  equation  (7)  with  limitations  (8),  (9),  and  (10),  the 
analysts  find  the  probability  of  A's  filing  a  law  suit  as  a  function 
of  p: 

l-{l-ax)  (1-p+p2)  (11) 
l-(l-a1)2(l-2p+2p2)  ' 


p  y2 


=x. 


Theorem  1. 

For  any  a|,a2,p,y2e[0,l]  equation  (7)  has  the  root  x0e[0,l]. 


Mathematical  analysis  shows  that  function  (1 1)  at  the  interval 
[0,1]  has  its  minimum  at  p=l  and  p=0,  that  is,  either  when  the 
(7)  dramatization  of  a  conflict  is  at  its  greatest  or  when  it  is  absent. 
This  answer  solves  the  dilemma:  company  B  must  reject  its 
consultants'  advice  and  not  spend  money  on  dramatizing  its 
relationship  with  firm  A. 

While  modeling  the  player's  behavior  we  used  very 
little  specific  information  about  him  or  about  the  situation.  This 
is  possible  only  because  the  model  is  based  on  the  fundamentals 
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of  human  choice.  Equations  (I)  through  (4)  contain  this 
information. 

5.  GAME  REPRESENTATION 

Let  two  interacting  players  have  correct  images  of  each 
other.  Then  the  following  system  of  equations  corresponds  to  the 
game: 


f,(a„a2,p,y,x)=x  " 
f2(b,,b2,q,x,y)=y  . 


(15) 


whose  parameters  and  variables  may  be  related  functionally. 

Using  the  theory  of  reflexive  games  it  is  also  possible 
to  construct  dynamic  models  of  games.  To  do  so  one  has  to  use 
systems  of  equations  of  the  following  type: 


X 

*2  p  y 


=  X 


q  X 


(12) 


=  y 


Theorem  2. 

For  any  a^.p.bLbj.qelO.l]  the  system  of  equations  (12)  has 
solution  Xo,y0e[0,l]. 

Consider  an  example:  let  a,=  1/3,  a2=3/4,  b,=l/3, 
b2=  1/2,  p=l,  and  q=l.  Then  the  system  is 


3  x 

(J)  =x 


(13) 


(I)  «x 


In  transforming  these  equations  into  a  standard  form  we  obtain 
a  system  of  two  non-linear  equations: 


Fl(yn>Xn)=Xn+l 
F2(Xn>yn)=yn+l 


(16) 


From  the  mathematical  point  of  view  the  theory  of 
reflexive  games  is  connected  with  non-linear  equations  and 
dynamic  systems. 
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6-6x-4y+xy=0 
3-2x-3y+xy=0 


(14) 


This  system  has  two  solutions,  but  only  one  of  them  has  roots  in 
the  interval  [0,1]: 


x0=0.557,  y0=0.772  . 

The  model  helps  us  to  find  that,  under  the  conditions  reflected 
by  the  system  (13),  the  first  player  chooses  an  active  strategy 
with  the  probability  0.557,  and  the  second  one  with  the 
probability  0.772.  Generally,  a  game  corresponds  to  the  system: 
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ABSTRACT 

The  need  to  empower  end  users  with  programming  capacities  in 
order  adapt  and  extend  software  applications  brings  forth  the 
opportunity  to  examine  and  explore  semiotic  aspects  in  computing. 
This  paper  discusses  some  of  these  aspects  within  a  knowledge- 
based  extensible  environment  where  (a)  software  designers  can 
explain  their  rationale  to  users  and  (b)  differentiated  discourse  for 
user-system  interaction  and  designer-user  communication  fosters  a 
better  understanding  of  the  application's  model  and  features. 


KEYWORDS     Semiotic  Engineering,  End  User 

Programming,  Knowledge  Representation 


1.  Introduction 

A  growing  need  to  empower  computer  users  so  that  they  can 
customize,  extend,  and  create  software  has  been  fuelling 
research  in  such  fields  as  Human-Computer  Interaction 
(HCI),  End-User  Programming  (EUP),  and  Artificial 
Intelligence  (AI).  Software  Engineering  is  rapidly  revealing 
its  semiotic  nature  as  programs  and  systems  produce, 
interpret,  and  become  themselves  documents  and  messages 
which  groups  of  people  can  directly  or  indirectly  exchange. 
In  the  sequel  of  previous  work  about  the  Semiotic 
Engineering  of  User  Interface  Languages  [5],  this  paper 
broadens  the  scope  of  computer  language  and  message 
design  in  order  to  try  and  meet  the  challenges  of  wider 
computer  literacy.  It  adopts  a  communication-centered 
architecture  to  support  the  users'  learning  processes  required 
for  programming  and  uses  differentiated  agent-designer 
discourse  to  help  users  grasp  the  intended  application's 
model. 

The  main  point  in  this  work  is  a  finer  structuring  of 
communication  in  interface  dialogues  (involving  users, 
systems,  and  designers),  which  promotes  programmers  and 
systems  designers  to  the  role  of  writers  of  material  to  be 
learned,  adapted,  and  used  as  inspiration  by  computational 
readers.  This  is  our  interpretation  of  the  computer  literacy 
challenge  of  software  industry.  We  unite  HCI  and  AI  under 
the  cover  of  Semiotics  and  discuss  some  communicative 
requirements   and   alternatives   for   intelligent  extensible 


computer  applications.  We  do  not,  however,  discuss  the 
features  of  the  underlying  macro  language  in  which  users  can 
codify  extensions  to  the  application. 

The  reported  work  is  part  of  an  ongoing  research  project 
aiming  at  the  construction  of  a  fully-functional  EUP 
environment.  The  global  idea  is  presented  in  Section  2,  and 
specific  implementation  aspects  are  raised  and  discussed  in 
Section  3,  with  an  emphasis  on  representation  languages. 
Section  4  presents  our  current  conclusions  about  the 
linguistic  requirements  for  a  communication  centered 
architecture  to  stand  and  be  useful,  as  well  as  the  benefits  we 
identify  in  this  proposal. 

2.  A  Communication-Centered 

ARCHITECTURE 

User  Interfaces  can  be  seen  as  meta-communications 
artifacts.  They  obviously  send  and  receive  messages  during 
interaction  with  users,  but  they  are  themselves  an  achieved 
one-shot  message  sent  from  system  designer  to  system  user. 
This  idea  is  central  to  Semiotic  Engineering  [5],  a 
theoretically-based  approach  to  designing  user  interface 
languages  and  codes.  However,  user  interfaces  must  now 
support  not  only  productive  user-system  interaction,  but  also 
end-user  programming.  This  can  be  achieved  by  more  or  less 
sophisticated  techniques  for  configuration  settings,  macro 
recordings,  or  programming  in  the  small  [12;  15],  as  well  as 
by  some  knowledge-based  (KB)  programming  by 
demonstration  [4]. 

For  users  to  realize  that  they  need  and  can  re-configure,  re- 
program,  or  extend  software,  they  first  need  to  understand  its 
meaning,  and  then  understand  how  new  meanings  can  be 
added  to  it  or  replace  old  meanings.  Such  laborious  processes 
of  interpretation  should  be  facilitated  and  supported  by 
conscious  Semiotic  Engineering  of  computational  languages. 
Thus,  users  should  not  only  be  able  to  make  sense  of 
designers'  intents,  but  also  express  and  achieve  their  own 
meanings. 

Designers  always  predict  and  prescribe  communication  form 
and  content.  Good  designers  make  good  predictions  and 
write  good  prescriptions.  However,  users  are  seldom  (if  ever) 
deliberately  made  aware  of  the  arbitrary  rules  and  models 
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that  underlie  applications.  In  reaction  to  an  excessive  stress 
on  grammatical  rule-learning  imposed  by  the  command 
language  interface  style,  current  user  interface  guidelines 
seem  to  be  heading  to  some  elusive  minimal  rule-learning 
ideal.  Popular  Graphic  User  Interface  (GUI)  style  guides  [3; 
11]  follow  the  trend  of  capitalizing  on  metaphors  and 
analogies,  and  requiring  trivial  click'n'drag  interactive 
modes.  But,  there  is  as  much  arbitrariness  and  language 
learning  there  as  there  is  in  any  command  language  interface. 

Our  communication-centered  architecture  is  one  which 
exposes  the  existence  of  arbitrary  learnable  computer  codes, 
and  optionally  unfolds  [7]  successive  levels  of  their 
grammatical  and  semantic  design  rules,  so  that  users  can 
potentially  wield  such  linguistic  resources  to  build  new 
discourse,  by  adapting  and  programming  applications.  Figure 
1  presents  the  global  scenario  for  reference  in  the  following. 


question-answering  and  the  visualization  facilities.  MS  Word 
for  Windows,  for  example,  includes  the  other  3  alternatives 
and  a  macro  language  interpreter  of  its  own. 


designer 


Typical  HCI  Scenario 


Extended  HCI/EUP 
Scenario 


Figure  1:  Communication  in  HCI  &  EUP 

Along  with  each  extensible  application,  a  rich  learning 
environment  must  be  provided  to  users.  Important 
components  thereof  are  multimodal  multicode  interfaces, 
explanation  KB-systems,  on-line  help  and  documentation 
modules,  as  well  as  textual  and  graphic  programming 
environments.  Embedded  explanation  modules  should  help 
designers  adopt  a  communication  perspective  as  if  they  were 
engaged  in  documentary  book-writing  or  film-making,  and 
cared  to  get  their  message  across. 

The  basic  architecture  we  envisage  for  truly  extensible 
computer  applications,  from  an  end  user's  point  of  view,  is 
presented  in  Figure  2.  At  first  sight,  the  adopted  architecture 
may  appear  overwhelmingly  complex  and  undesirable  from  a 
cost/benefit  perspective.  Nevertheless,  at  a  closer  look,  we 
realize  that  a  number  of  popular  office  automation  packages 
already  offer  users  most  of  the  proposed  components.  On  the 
User  Interface  side,  the  only  lacking  modes  are  the  NL 


Kowledge 
Representation 
Language  Interpreter 


/  Graphic  Interface  Widgets 
/  Direct  Manipulation  Canvas 

Natural  Language 

Question-Answering 

I  Text  and  Diagram  Vizualization 

Text  Editing 


Figure  2:  A  Communication-Centered  Architecture 

The  missing  end  of  existing  extensible  applications  is  the 
knowledge  base  one.  In  the  lower  left  side  of  Figure  2,  we 
clearly  see  an  embedded  intelligent  system  (reasoner  and 
Knowledge  Base  (KB)).  In  the  upper  side,  we  see  a  runtime 
consistency  checker  and  a  Program  and  Data  Base  (P&DB). 
This  latter  component  is  the  repository  of  end  users' 
programmed  extensions  which  are  added  to  the  application  in 
separate  modular  form. 

The  overall  communication  management  is  achieved  by 
means  of  integrated  programming-explaining  activities. 
When  a  user  engages  into  introducing  a  new  functionality  to 
an  existing  application,  explanations  are  elicited  from  him  or 
her  in  schematic  form.  Thus,  not  only  are  new  items 
introduced  in  the  P&DB,  but  also  the  corresponding 
explanatory  items  are  added  to  the  KB.  This  feature  requires 
that  programming  and  knowledge  representation  languages 
be  co-referential  and  maintain  consistency  between  each 
other. 

Co-referentiality  between  input  and  output  languages  has 
been  previously  proposed  in  the  framework  of  user-centered 
systems  design  [8J.  Our  leap  here  is  to  extend  linguistic  co- 
referentiality  from  the  realm  of  interfaces  to  that  of 
applications.  Of  course,  requiring  users  to  explain  their 
programming  to  the  system  is  preceded  by  requiring  the 
applications'  designers  and  programmers  to  do  the  same.  The 
whole  point  of  communication  centered  programming  is  to 
tell  people  what  we  as  programmers  have  in  mind  and 
increase  the  chances  that  we  are  understood,  and  that  our 
product  becomes  really  useable  and  useful. 

3.  Human-Computer  Interaction, 
Knowledge-Representation,  and  End- 
User  Programming 

We  will  now  illustrate  how  a  communication  perspective 
impacts  design  by  resorting  to  features  of  a  demo  KB 
application  we  have  been  working  on.  It  is  based  on  Pattis  el 
al.'s  "Karel,  the  Robot  -  A  Gentle  Introduction  to  the  Art  of 
Programming"  [16],  whose  aim  is  to  teach  novices  how  to 
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program.  We  have  built  a  Visual  Basic  Interface  to  Karel's 
World  (see  Figure  3,  at  the  end  of  this  paper),  and  provided 
interaction  for  Karel's  minimal  operating  capacities  about 
moving,  turning  left,  picking  up  and  putting  down  beepers,  in 
a  gridded  (limited)  space  where  walls  can  block  his  passage. 
We  have  also  opened  the  opportunity  for  extensions  by 
allowing  Karel  to  learn  new  things.  The  simplest  of  these, 
appears  in  the  upper  side  of  the  screen  on  Figure  3  ("Karel's 
learned  actions")  —  TURN  AROUND.  Along  Pattis's  lines, 
TURN  AROUND  is  a  possible  user-programmed  extension 
to  Karel's  ability  that  results  from  a  sequence  of  2 
consecutive  primitive  TURN  LEFT  commands  grouped 
together  by  some  macro  recording  and  naming  mechanism. 

But,  unlike  what  holds  for  the  original  proposal,  we  want 
users  to  be  told  about  (1)  the  original  programmer's  arbitrary 
decisions,  (2)  this  programmer's  personal  interpretations  and 
intentions  in  endowing  Karel  with  computing  capacity,  and 
(3)  getting  the  notion  of  how  he  or  she  can  now,  as  a  user, 
change  the  original  model  and  express  his  or  her  own 
interpretations  and  intentions  in  playing  with  the 
programmable  robot. 

At  the  User  Interface  level,  the  new  perspective  in  design 
becomes  evident  in  the  layout  shown  in  Figure  3.  The  World 
View  window  allows  users  to  deploy  walls  and  beepers  in 
Karel's  world  via  direct  manipulation  of  graphic  objects. 
Since  the  idea  is  to  program  (i.e.  to  tell,  in  linguistic  mode) 
Karel  to  evolve  in  this  space,  direct  manipulation  is  halted 
when  it  comes  to  commanding  the  robot  to  move  and  change 
the  world.  For  this,  users  have  the  more  traditional  options  of 
menu  selection  and  button  pressing. 

Two  additional  windows  appear  at  the  interface:  one  for 
Karel's  messages  and  another  for  the  designer's  messages. 
The  former  are  the  result  of  the  robot's  being  asked 
situational  questions  such  as:  "How  many  beepers  do  you 
have?"  or  "Where  have  you  been?".  The  latter  are  produced 
as  a  typical  result  of  "Why"  and  "Why-not"  questions.  For 
instance,  queries  about  "Why  Karel  cannot  carry  more  than 
10  beepers  with  him"  should  not  be  directed  to  Karel,  but  to 
his  "creator"  —  the  programmer.  The  answer  is  expected  to 
be  that  the  number  10  is  an  arbitrated  value  a  user  CAN  or 
CANNOT  alter  in  the  original  application. 

Note  that  knowledge  in  this  scenario  is  used  BY  THE 
DESIGNER,  not  by  the  agent.  Karel  IS  NOT  an  intelligent 
agent.  We  could  endow  Karel  with  virtually  no  knowledge 
and  have  it  shutdown  if  he  is  ordered  to  move  forward  and 
bumps  into  a  wall.  However,  in  our  program,  we  have 
decided  to  test  for  disastrous  actions  before  they  are 
performed.  It  is  the  programmer,  and  not  Karel,  who  stops 
the  agent  short  of  shutting  down  and  uses  a  KB  explanation 
system  to  reflect  his  decision  that  Karel  is  a  physical  model 
which  does  not  traverse  walls.  In  Figure  3,  another  shutdown 
situation  caused  by  a  wrong  MOVE  command  is  prevented 
by  the  designer  and  explained  in  the  lower  right-hand  side  of 
the  screen. 


An  interesting  extension  case  is  that  of  a  user  "teaching" 
Karel  how  to  GO  HOME.  Experienced  programmers  would 
do  this  by  implementing  complex  graph  navigation 
algorithms,  so  that  the  shortest  possible  path  between  Karel's 
current  position  and  his  home  (coordinate  0,0)  is  securely 
followed.  End  user  programming,  however,  could  introduce 
some  apparently  "naive"  computational  solutions  as  making 
Karel  go  home  by  walking  back  on  his  steps  (note  that  in 
Figure  3,  users  can  ask  Karel  "Where  have  you  been?"). 
Since  (1)  he  always  starts  his  journey  from  home,  and  (2) 
any  place  he  can  currently  be  has  necessarily  been  reached 
through  a  valid  path,  he  can  always  track  his  way  back  by 
resorting  to  his  "memory"  of  visited  places. 

Minimal  path  algorithms  can  be  as  much  of  an  idealization  as 
indefinitely  large  memory  store  space  (required  for  Karel  to 
be  able  to  remember  EVERYTHING  he  has  done  from  the 
beginning  of  a  session).  For  an  EUP  scenario,  the  best  choice 
is  that  which  represents  the  least  cognitive  effort  for  the  user 
to  program.  But,  an  experienced  programmer  certainly  has 
reasons  to  go  for  the  minimal  path  idealization.  Among  other 
qualities,  this  approach  is  more  general  than  the  maximal 
storage  one;  it  serves  a  general  capacity  of  going  from  any 
one  point  in  space  to  another,  no  matter  the  peculiar  relations 
they  may  bear  on  an  agent's  experience  in  the  world  (e.g. 
having  been  previously  visited,  having  beepers  at  it,  having 
walls,  being  home). 

For  Karel  to  be  able  to  GO  TO  anywhere  in  his  world,  there 
must  be  KNOWLEDGE  about  the  way  he  was  designed. 
Most  KB  systems  have  no  problem  in  endowing  the  agents 
with  such  reasoning  capacity.  Tale  telling  interfaces  to  such 
systems  slip  easily  into  1st  person  discourse  and  present 
messages  that  start  with  "I  cannot"  or  mention  "my  path". 
The  implications  of  such  anthropomorphism  may  lead  users 
to  build  numerous  misconceptions  about  how  intelligent  and 
able  computer  agents  can  be.  For  instance,  given  that  an 
agent  knows  he  cannot  traverse  walls,  he  could  be  expected 
to  know  why  and  what  would  happen  if  he  could  (i.e.  have 
meta-knowledge). 

In  our  Semiotic  Engineering  approach,  the  designer  is  a 
legitimate  voice  in  discourse.  Whatever  the  agent  can  or 
cannot  do,  it  is  a  consequence  of  human  interpretation  and 
programming.  Therefore,  if  Karel  is  endowed  with  the  ability 
go  from  any  one  point  to  another,  or  diagnose  he  is  walled 
into  (or  out  of)  a  subspace,  this  is  the  product  of  a  designers 
intellectual  and  technical  decision. 

The  impacts  of  having  users  identify  the  origin  of  such 
decisions  and  interpretations  is  similar  to  that  of  letting  them 
know  that  trying  to  undo  a  SAVE  action  in  typical  text 
editing  interaction  is  not  wrong.  It  is  perfectly  logical  to  try 
to  backtrack  from  SAVE,  in  the  same  way  one  backtracks 
from  a  SELECT_ALL  and  DELETE  sequence  of  actions. 
However,  it  is  the  cost  of  maintaining  formatted  files  in 
"buffers"  within  the  application  that  gears  arbitrary  choices, 
even  if  net  damage  in  both  situations  is  rigorously  the  same 
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from  the  user's  point  of  view  (i.e.  backtracking  from  a 
disastrous  piece  of  interaction). 

The  costs  of  imparting  more  knowledge  in  extensible 
application  environments  may  be  high.  But,  we  will  give  an 
example  of  how  we  have  designed  an  explainable  KB  in 
Visual  Prolog,  to  be  coupled  to  Karel's  Visual  Basic 
Interface.  It  should  show  that  the  effect  of  such  knowledge  is 
quite  considerable  compared  to  current  commercially 
available  products.  A  span  of  Prolog  programming  for  our 
KB  is  shown  below. 

1 .  count_bagbeepers   : - 

not ( beeper s_in_bag (_) ) ,   nl , write ( "Karel 
and  his  world  have  not  been  loaded 
yet. " ) , nl, ! . 

2.  count_bagbeepers   :-  beeper s_in_bag (X) , 
X=0 , nl , write ( "As  of  now,    I  carry  no 
beepers  in  my  bag . " ) , nl , ! . 

3.  count_bagbeepers   :-  beepers_in_bag (X) , 
X<>0 , nl , write ( "As  of  now,    I  carry  ",X," 
beeper ( s )    in  my  bag . " ) , nl , ! . 

Messages  associated  to  the  count_beepers  predicate  are 
answers  to  questions  about  how  many  beepers  Karel  is 
carrying  with  him  at  a  given  point.  Although  they  may  at 
first  look  like  regular  simple  output  messages  any  program 
can  print,  from  a  KB  perspective  they  are  epistemologically 
different.  Clauses  2  and  3  are  consistent  with  an  agent- 
centered  point  of  view.  Clause  1,  however,  is  quite  different 
in  that  an  agent  that  hasn't  been  "created"  (loaded)  cannot 
possibly  inspect  his  state.  In  communicative  terms,  clause  1 
is  used  in  the  generation  of  a  Designer's  Messages,  whereas 
the  other  two  are  used  in  that  of  Karel's  Messages.  The  effect 
of  multiple-person  discourse  is  more  clearly  noticeable  in  the 
following  case. 

no_move ( east , coord (X, Y) )  :- 

4.1.  NewX  =  X+l,    fallof f ( coord (NewX, Y) ) ,   nl , 
write ("This  move  would  make  Karel  fall 
off  the  limits  of  his  world ."), nl ,! ; 

4.2.  NewX  =  X+l,   bump(coord(NewX,Y) ) ,   nl , 
write ("This  move  would  make  Karel  bump 
right  into  a  wall . " ) , nl , ! . 

The  above  OR  clause  (";")  states  the  disjunction  of 
conditions  that  are  sufficient  for  a  move  to  be  impossible 
(codified  in  the  KB  by  predicate  the 
no_move  (east ,  coord  (X,  Y)  )  ) .  In  our  design,  this  clause 
is  used  in  messages  generated  by  the  designer  —  not  by 
Karel.  Our  intention  is  to  prevent  Karel  from  being 
conceived  as  an  intelligent  (even  if  modestly  so)  agent.  The 
implicatures  in  switching  discourse  from  designer  to  Karel 
have  been  associated  above  to  arbitrary  levels  of  meta- 
knowledge designers  would  have  to  endow  computer  agents 
with.  The  role  of  a  designer  is  then  to  dismiss 
misconceptions  with  messages  such  as  the  proposed  in  4.1 
and  4.2.  Moreover,  the  very  fact  of  telling  "the  truth"  about  a 
computer  artifact  duly  places  designers  in  their  seat  of 
software  constructors,  and  weakens  the  idea  that  software  is 
"just  like  that",  hopelessly  inexplicable. 


At  KB  design  time,  a  communication  perspective  leads  us  to 
generate  code  of  the  sort  illustrated  by  the  following  Prolog 
lines  relative  to  reasoning  about  move: 

5.1.  move , nl , write (" I  have  moved 
forward .  " )  ,  nl ,  !  ; 

5.2.  not (move) , nl , write ( "Karel  has  not 
moved  forward . " ) , nl , ! . 

In  other  words,  we  codify  not  only  the  knowledge  about  the 
conditions  that  allow  Karel  to  move  and  know  such  things  as 
his  new  position  and  what  action  he  has  just  performed,  but 
also  those  conditions  that  prevent  him  from  doing  so.  In  each 
case,  discourse  is  produced  by  different  communicators.  For 
every  negated  action,  there  must  by  an  associated  WHY 
NOT  question  users  may  ask  DESIGNERS  to  understand 
what  happened.  This  is  made  possible  if  designers  introduce 
knowledge  codified  as. 

6 .   karel (Ori , Loc ) , no_move (Ori , Loc ) , ! . 

Combined  with  the  instantiation  of  the  no_move  predicate 
proposed  in  4.1  above,  goal  6  generates  the  explanation 
illustrated  in  Figure  3.  Thus,  a  discipline  in  generating 
explanatory  discourse  for  every  piece  of  extensible  code  in  a 
given  application  is  required  from  designer.  This  discipline 
can  be  facilitated  by  a  programming  environment  that  elicits 
knowledge  from  designers  and  maintain  co-referentiality 
between  programs  and  knowledge  base.  From  the  Visual 
Basic  end,  the  code  for  MOVING  Karel  has  only  minor 
additions,  as  seen  in: 

7.   Public  Sub  MoveO 

Dim  newColumn  As  Integer 
Dim  newRow  As  Integer 
newColumn  =  column 
newRow  =  row 
'begin  prolog 
<<KB  CHECKING>> 
'end  prolog 

iRow  =  newRow 

iColumn  =  newColumn 

vRobot . SetCoords  iColumn,  iRow 

vRobot . Draw 
End  Sub 

The  Prolog  program  call  in  the  code  may  include  data 
exchange  or  file  read/writing  instructions.  The  real  change  in 
design  is  a  consistency  checking  one  step  before  codifying 
the  new  drawings  to  appear  in  the  GUI  depicted  in  Figure  3. 
The  bonus  is  the  generation  of  explanations  and  the 
transmission  of  the  designer's  rationale  in  building  these 
parts  of  the  application.  We  should  emphasize  that  those 
parts  that  are  not  meant  to  be  extended  by  the  user  do  not 
require  the  same  degree  of  knowledge  elicitation  and 
explanations. 

From  a  user  perspective,  the  interface  overhead  is  that  of 
connecting  user-generated  code  to  user-generated 
explanations  about  code.  Just  as  EUP  languages  are  not  C  or 
Pascal,  but  some  easier  subset  of  these,  KB  representation 
languages  are  not  Prolog.  The  need  for  more  adequate 
programming  languages  goes  hand  in  hand  with  the  need  for 
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more  adequate  knowledge  acquisition  languages.  This  is  the 
aim  of  ongoing  research  in  our  current  project. 

4.  Conclusion 

Computer  Semiotics  has  grown  from  Interface  Design  [13], 
to  Systems  Analysis  and  Design  [2],  and  Multimedia 
Applications  Design  [1;  14].  Previous  semiotic  accounts  of 
computing,  included  mathematical  perspectives  [e.g.  18],  as 
well  as  philosophical  ones  [e.g.  9].  The  work  of  Peirce, 
himself,  with  its  original  logic  underpinnings  [17],  now 
experiences  a  revival  in  theoretical  computing  and 
information  processing  [e.g.  10]. 

The  anticipated  need  for  end  user  programming  has  not  been 
fully  explored  by  Computer  Semiotics  [6],  and  requires  that 
software  design  be  made  with  a  view  on  letting  users  learn 
more  about  applications  than  is  required  in  most  current  HCI 
scenarios.  Heavier  cognitive  loads  involved  in  programming, 
if  compared  to  typical  human-computer  interaction,  magnify 
the  architectural  scope  of  application  environments  and 
suggest  that  embedded  knowledge-based  systems  may  play  a 
major  role  in  supporting  the  organization  of  explanatory 
discourse  about  extensible  application  modules.  In  semiotic 
terms,  reasoning  systems  allow  users  to  gain  one  degree  of 
perspective  on  a  designer's  interpretation  of  a  given  domain, 
its  objects  and  tasks,  as  well  as  on  his  or  her  encoded 
abstractions  and  arbitration  in  building  a  computer  model  of 
the  domain  and  its  implementation. 

This  leads  us  to  adopt  a  broader  semiotic  perspective  on 
Software  Engineering,  and  viewing  programs  generated  in 
the  communicative  approach  proposed  above  as  effectors  of 
limited  semiosis.  Although  the  term  may  shock  the  Peircean 
tradition,  in  Computer  Science  it  gains  life  and  dimension  as 
interface  messages  are  algorithmically  interpreted  in  a 
procedurally  limited  chain  of  signs  that  turn  into  other  signs 
and  eventually  crystallize  into  output.  Whichever  sign  is 
present  in  this  chain  was  once  part  of  the  designer's 
interpretant  derived  from  the  application's  domain. 

Knowledge-based  components  attached  to  extensible 
applications  empower  the  designers'  capacity  to  express 
themselves,  and  the  users'  capacity  to  make  sense  out  of 
systems.  We  have  shown  how  embedded  KB  explanation 
systems  enlarge  the  spectrum  of  interface  discourse  and 
allow  for  computer  agents,  programmers  and  users  to  express 
their  why's  and  why-not's  in  writing  and  executing  software. 
The  linguistic  overhead  is  that  of  designing  co-referential 
programming  and  knowledge  representation  languages  that 
users  can  access  from  the  application's  interface. 
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Karel's  Messages 


Karel,  where  are  you? 


How  many  beepers  do  you  have? 
|Where  have  you  been? 


'm  at  coordinate  (1 0.5),  tacing  East, 


Designer's  Messages 


What  was  Karel's  world  like  at  the  start? 
Why  <world  state  or  Karel's  action>? 


;Why  not  <Karel's  action>? 


This  move  would  make  Karel  fall  off  the  limits  of  his  world 


H  i 


Figure  3:  A  snapshot  of  Karel's  Worldscreen. 
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ABSTRACT 

The  aim  of  this  paper  is  to  introduce  the  theoretical  foundations  of 
an  approach  for  intelligent  systems  development.  Derived  from 
semiotics,  a  classic  discipline  in  human  sciences,  the  theory 
developed  provides  a  mathematical  framework  for  the  concept  of 
knowledge  and  for  knowledge  processing.  As  a  result,  a  new 
perspective  to  study  and  to  develop  intelligent  systems  emerges.  A 
taxonomy  of  elementary  types  of  knowledge  is  proposed  based  on 
the  classification  of  types  of  signs  in  semiotics,  followed  by  a 
another  classification  of  knowledge  from  the  point  of  view  of 
application  in  cognitive  systems.  In  addition,  we  propose  the 
mathematical  definition  of  objects,  objects  systems  and  objects 
networks,  to  model  mathematically  the  different  types  of  knowledge 
described.  The  symbiosis  of  such  key  concepts  introduces  a 
computational  paradigm  to  develop  and  implement  intelligent 
systems,  called  here  computational  semiotics. 

KEYWORDS;  computational  semiotics,  theory  of  objects, 
intelligent  systems,  models  of  knowledge 

1.  INTRODUCTION 

Human  intelligence  has  always  been  of  interest  and  curiosity 
in  the  scientific  world.  In  1991,  Albus  published  an  outline 
for  a  theory  of  intelligence,  and  simultaneously  Brooks  [3] 
argued  that  for  an  intelligent  behavior,  there  should  not 
necessarily  exist  representation  or  inference.  Additional 
aspects  of  intelligence,  e.g.  approximate  reasoning  (including 
fuzzy  or  incomplete  concepts),  learning,  prediction,  and 
adaptation  are  being  studied  in  the  fields  of  computational 
intelligence  [14]  and  soft  computing  [13].  Considerable 
research  effort  on  fuzzy  set  theory,  neural  networks  and 
evolutive  systems  have  and  still  are  being  pursued.  The 
contribution  of  these  fields  in  understanding  the  nature  of 
human  intelligence  has  been  quite  impressive. 
Parallel  to  the  developments  in  computer  science  and 
engineering,  in  human  sciences  there  was  a  similar  effort  to 
model  intelligence  and  intelligent  behavior.  Well  known 
examples  include  the  work  of  Piaget  [2],  and  the  development 
of  semiotics  by  Peirce  and  Morris  [12,9,10,11],  just  to 
mention  a  few.  Semiotics  deals  with  signs  (representations), 
objects  (phenomena)  and  interpretants  (knowledge),  that  is, 
the  main  issues  in  cognition  and  communication.  Semiotics 
has  shown  to  be  an  useful  tool  especially  when  the  basic 


ingredients  of  intelligence  and  their  relationships  are  of 
concern. 

Despite  the  challenges  in  discovering  the  formal  mysteries 
behind  human  intelligence  and  the  intrinsic  difficulties  in 
building  machines  and  computer  programs  to  emulate 
intelligent  behavior,  very  few  works  analyzing  intelligence  in 
an  integrated  and  organized  manner  have  been  done.  Often, 
only  particular  aspects  of  intelligence  are  addressed.  A 
notable  exception  comes  from  Albus'  1991  paper.  In  his 
work,  Albus  provides  a  systematic  study  of  intelligence,  and 
gives  a  description  of  the  different  parts  composing  the  global 
phenomena.  The  integration  of  all  parts  should  lead  to 
intelligent  behavior.  Albus  definitions  and  theorems  are 
essentially  linguistic  due  to  the  lack  of  a  formal  system  to 
describe  intelligence.  In  other  words,  currently  there  is  no 
adequate  mathematical  model  to  describe  intelligence  as  a 
whole.  Most  existing  formalisms  are  closely  tied  to  particular 
aspects,  being  unsuitable  for  a  global,  computational 
formalization  of  intelligent  systems.  Semiotic  Modeling  and 
Situation  Analysis-SSA,  developed  by  Pospelov  and  his  team 
in  Russia  was  another  important  attempt  in  this  direction.  A 
key  feature  of  the  SSA  approach  is  extraction  of  knowledge 
from  the  descriptive  information  by  its  consistent  analysis 
based  upon  well  established  algorithms  [7].  From  this  point  of 
view,  mathematical  tools  of  semiotics  are  considered  to 
include  those  used  in  control  science,  pattern  recognition, 
neural  networks,  artificial  intelligence,  cybernetics.  But 
semiotic  specific  mathematical  tools  (for  combining  signs, 
symbols  and  extracting  meaning)  are  still  in  the  process  of 
development  [7]. 

In  [8],  the  use  of  semiotics  as  a  tool  suitable  for  the  analysis 
of  intelligent  systems  was  suggested.  Concurrently,  in  [4]  the 
computational  view  of  semiotics  for  modeling,  development 
and  implementation  of  intelligent  systems,  the  computational 
semiotics  approach,  was  proposed.  Computational  semiotics 
is  build  upon  a  mathematical  description  of  concepts  from 
classic  semiotics.  Its  formal  contents  can  be  regarded  as  a 
contribution  towards  the  development  of  semiotic  specific 
mathematical  tools.  Thus,  it  is  in  the  very  realm  of  the  formal 
foundations  of  intelligent  systems.  The  main  purpose  of  this 
paper  is  to  introduce  the  mathematical  aspects  which 
subsumes  computational  semiotics. 
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2.  ELEMENTARY  TYPES  OF 
KNOWLEDGE 

Based  on  the  classification  of  types  of  signs  in  semiotics,  we 
have  derived  a  hierarchy  of  elementary  types  of  knowledge, 
shown  in  figure  1 . 

Knowledge  is  divided  in  three  main  classes,  the  rhematic 
knowledge,  dicent  knowledge  and  argumentative  knowledge. 
Strictly  speaking,  rhematic  knowledge  concerns  the  semantic 
of  (linguistic)  terms,  dicent  knowledge  combines  sequences  of 
terms  with  truth  values  and  analyses  how  rhematic  knowledge 
relates  to  a  real  environment,  and  argumentative  knowledge 
embodies  the  knowledge  of  how  knowledge  is  transformed 
comprising  reasoning,  inference  and  learning. 

3.  APPLIED  KNOWLEDGE 

Based  on  its  intended  use,  knowledge  can  be  classified  as 
designative,  apraisive  or  prescriptive  (figure  2),  terms  coined 
by  Morris  [9,10,11].  This  classification  is  complementary  to 
the  elementary  types  of  knowledge.  In  principle,  any 
elementary  type  of  knowledge  can  be  used  as  designative, 
apraisive  or  prescriptive,  i.e.,  this  classification  is  orthogonal 
to  the  elementary  classification. 

Designative  knowledge  models  the  world.  For  this  purpose  it 
uses  rhematic,  dicent  and  argumentative  knowledge,  either 
specific  or  generic.  Designative  knowledge  can  also  be  viewed 
as  descriptive  knowledge.  A  cognitive  system  initially  has  just 
a  few,  or  eventually  no  designative  knowledge  at  all.  Usually 
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Apraisive 


Prescritive 


Figure  2  -  Classification  of  Applied  Knowledge 


designative  knowledge  emerges  from  the  interaction  between 
the  system  and  world. 

Apraisive  knowledge  is  a  type  of  knowledge  used  as  an 
evaluation,  a  judgment,  a  criteria  to  measure  the  success  in 
achieving  goals.  In  natural  systems,  apraisive  knowledge  is 
closely  related  with  the  essential  goals  of  a  being; 
reproduction,  survival  of  the  individual,  survival  of  the  specie, 
increasing  knowledge  about  the  world,  for  example. 
Depending  on  the  goal  it  assumes  special  forms  like:  desire, 
repulse,  fear,  anger,  hate,  love,  pleasure,  pain,  confort, 
disconfort,  etc.  Essentially,  apraisive  knowledge  evaluates  if  a 
given  sensation,  object,  or  occurrence  is  good  or  not,  as  far  as 
goal  achievement  is  concerned. 

Prescriptive  knowledge  is  intended  to  act  on  the  world. 
Basically,  prescriptive  knowledge  is  used  to  establish  and  to 
implement  plans  through  actuators.  However,  prescriptive 
knowledge  will  not  necessarily  end  up  with  an  action. 
Prescriptive  knowledge  may  also  be  used  to  do  predictions, 
but  only  one  of  them  is  selected  to  generate  an  action. 
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Figure  1  -  Classification  of  the  Elementary  Knowledge  Types 
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4.    A  MATHEMATICAL  THEORY  OF 
OBJECTS 

This  section  introduces  concepts  and  definitions  as  a  formal 
background  for  computational  semiotics.  The  focus  here  is  on 
the  main  issues  and  definitions  only.  For  a  more  in  depth 
coverage  the  reader  is  referred  to  [4]  or  [5,6]. 

4.1.  Variable 

Let  N  be  a  countable  set  with  a  generic  element  n,  and 
XcU.A  variable  x  of  type  X  is  a  function  x  :  N  ->  X  .  Note 
that  a  function  is  also  a  relation  and  hence  it  can  be  expressed 
by  a  set.  Thus,  x  c  N  x  X. 

4.2.  Class 

A  class  C  is  a  set  whose  elements  Cj  are  tuples  of  the  type: 
(vi,  v2 , ... ,  v„ ,  fi,  f2  , ... ,  fm ) ,  n  >  0,  m  >  0 
where  v;  e  V,,  and  fj  are  functions 

fi:Xv^Xv 

Here       means  the  cartesian  product,  Pj  c  { 1, ... ,  n}  and 
Q  c{l  n}. 

4.3.  Object 

Let  C  be  an  non-empty  class  and  c  be  a  variable  of  type  C. 
Thus  c  is  an  object  of  class  C. 

4.4.  Object  System 

A  set  of  objects  cs  is  an  object  system  if  the  q's  are  related  to 
each  other  in  the  sense  that  each  instance  of  such  objects  in  a 
given  instant  is  a  function  of  the  instances  of  all  objects  in  the 
previous  time  instant: 

ck(t+l)  =  f(c,(t), ...  ,cn(t) ). 

This  is  only  a  concise  definition  of  an  object  system.  The 
complete  definition  is  far  more  involved.  The  reader  is 
referred  to  [4]  or  [5,6]  for  details. 

4.5.  Object  Network 

An  object  system  is  a  too  generic  approach  for  modeling 
elementary  knowledge  types.  An  object  network  is  a  special 
type  of  object  system  in  which  additional  restrictions 
concerning  interactions  are  included.  To  distinguish  object 
network  and  object  system  let  us  assume  places  and  arcs 
whose  roles  are  similar  to  those  in  Petri  nets.  Objects  in 
places  can  only  interact  with  objects  in  places  connected 
through  arcs.  Thus,  at  each  instant,  the  objects  defined  should 
be  at  one  place.  For  each  place  there  is  a  set  of  places 
connected  with  through  of  input  arcs.  These  places  are  called 
the  input  gates  of  the  place.  Analogously,  each  place  has  a  set 


of  places  connected  with  it  by  means  of  output  arcs,  called 
output  gates. 

4.6.  Additional  Definitions 

Other  definitions,  important  for  the  particular  aspects  of  some 
of  the  knowledge  types,  may  be  found  in  [4]  and  [5,6].  They 
include  the  temporal  restriction  for  objects,  set  variable, 
generic  objects,  fuzzy  objects,  meta-objects,  instances  of  meta- 
objects,  occurrences  of  meta-objects  in  objects,  generic  objects 
and  fuzzy  objects,  generic  meta-objects,  occurrences  of 
generic  meta-objects  in  objects,  generic  objects  and  fuzzy 
objects,  fuzzy  meta-objects,  occurrences  of  fuzzy  meta-objects 
in  objects,  generic  objects  and  fuzzy  objects. 

5.    MODELS  OF  KNOWLEDGE  TYPES 

Using  the  mathematical  definitions  given  above,  we  are  able 
to  model  the  elementary  types  of  knowledge  and  use  them  to 
build  intelligent  systems,  mainly  concerning  the  applied 
aspects  of  knowledge,  i.e.,  their  property  of  being  designative, 
apraisive  or  prescriptive. 

Basically,  sensorial  and  object  rhematic  iconic  knowledges 
can  be  modeled  by  passive  objects,  i.e.,  objects  that  do  not 
have  functions  in  its  image  tuples.  Occurrence  rhematic 
knowledge  can  be  modeled  by  meta-objects  (standard,  generic 
or  fuzzy).  Those  meta-objects  can  be  reduced,  however,  to 
objects  by  means  of  appropriated  techniques.  Dicent 
knowledge  can  also  be  modeled  by  objects.  Argumentative 
knowledge  must  be  modeled  by  active  objects,  i.e.,  objects  that 
have  working  functions  in  its  image  tuples.  The  complete 
description  of  such  modeling  representations  can  be  found  in 
[4]  or  [6]. 

This  leads  to  a  scenario  where  a  whole  intelligent  system  may 
be  modeled  by  an  object  network.  The  representation  by 
means  of  an  object  network  has  many  advantages.  An  object 
network  is  more  powerful  than  a  Petri  net  in  the  sense  that  it 
allows  modifications  in  its  active  parts,  what  is  not  possible  in 
Petri  nets.  This  is  important  for  systems  that  have  learning 
and  adaptive  capabilities,  which  can  not  be  represented  by 
Petri  nets,  including  there  colored  Petri  nets.  The  possibility 
of  representing  an  intelligent  system  by  a  formal 
computational  tool  allows  for  a  more  in  depth  study  of 
phenomena  involving  intelligent  behavior.  Some  properties 
that  were  targeted  linguistically  in  early  studies  of  intelligent 
systems  (e.g.  [1]),  may  be  translated  into  a  mathematical 
framework,  allowing  for  a  more  solid  foundation.  In  this 
sense,  the  tools  provided  by  computational  semiotics  seems  to 
be  a  very  promising  set  of  mechanisms  for  building  a  future 
theory  of  intelligence. 
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6.  CONCLUSIONS 


7.  REFERENCES 


In  this  paper  we  briefly  introduced  a  new  approach  for  the 
study  of  intelligent  systems.  This  approach,  called 
computational  semiotics,  uses  the  concepts  brought  from 
semiotics  to  propose  a  hierarchy  of  elementary  types  of 
knowledge,  and  based  on  a  mathematical  framework,  models 
such  knowledges  in  a  mathematical  way.  Due  to  its  object 
oriented  nature,  the  mathematical  model  is  very  suitable  for 
computational  implementation,  providing  in  addition,  a 
mathematical  description  of  intelligent  systems. 
It  is  very  important  to  stress,  though,  that  such  an  approach  is 
in  its  very  beginning.  The  presented  taxonomy  of  types  of 
knowledge,  despite  significant,  is  only  partial.  In  his  works, 
Peirce  identifies  more  than  100  different  types  of  signs, 
eventually  implying  in  different  types  of  knowledge.  These 
are  not  included  in  the  presented  taxonomy.  But,  the 
presented  taxonomy  provides  an  elaboration  of  rather 
sophisticated  intelligent  systems.  More  than  that,  it  creates  an 
organization  that  is  not  usually  found  in  literature, 
concerning  the  differences  among  the  knowledge  used  when 
building  intelligent  systems.  The  formalism  presented  for 
objects  in  this  paper,  does  not  aim  to  be  a  general  theory  for 
objects,  but  simply  put  foundations  for  a  future  grow  of  such  a 
theory.  Some  extensions  are  actually  needed,  mainly  to 
consider  asynchronous  interaction  among  objects.  But,  the 
formalism,  in  its  current  form,  is  suitable  to  represent 
intelligent  systems,  what  is  a  very  important  characteristic. 
Despite  its  representation  power,  the  object  networks 
developed  upon  the  mathematical  concept  for  objects  still 
have  many  limitations.  For  example,  analysis  tools  are  still 
incipient,  when  compared  with  other  modeling  tools,  e.g., 
Petri  Nets.  Indeed,  very  few  systems  have  been  modeled  by 
the  object  network  formalism.  In  addition,  there  is  a  lack  of  a 
formal  representation  for  the  types  of  knowledge  not  covered 
by  our  elementary  knowledge  hierarchy.  As  new  types  of 
knowledge  are  included  in  the  hierarchy,  new  formal 
definitions  would  be  demanded.  Very  few  intelligent  systems 
were  built  so  far  using  the  computational  formalism.  To 
consolidate  object  networks  as  a  valid  and  general  tool  for 
modeling  intelligent  systems,  it  is  still  necessary  to  solve  a 
broad  class  of  problems  to  emphasize  its  virtues  and  to 
precisely  identify  the  extensions  needed. 
An  application  example  concerning  the  control  of  an 
autonomous  vehicle  was  successfully  developed  and 
implemented  using  the  computational  semiotics  approach 
[4,6]. 
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ABSTRACT 

This  paper  reports  on  a  new  model  giving  a  sequential  semantic 
representation  of  an  action  verb.  We  show  that  the  algorithm  for  static 
categorization  which  utilizes  Ideational  information  to  represent  the 
defining  characteristics  of  individuals  can  also  play  a  central  role  in  the 
dynamic  categorization.  Static  categorization  involved  in  the 
definition  of  stative  predicates  is  stipulated  in  terms  of  Category 
Function.  Orientation  Function,  taking  as  one  of  its  arguments  the 
defining  characteristics  of  the  denotation  of  each  category,  is  defined  to 
capture  the  correspondence  relation  between  the  bodies  of  the  same 
individual  at  two  different  time-points.  With  the  former  two  functions 
and  the  function  derived  from  the  latter,  it  is  possible  to  construct  an 
algorithm  to  trace  the  movements  of  the  place-points  of  individuals' 
bases.  This  enables  us  to  distinguish  the  parts  which  have  more  or  less 
moved  from  those  occupying  the  same  space.  Our  point  is  that 
capturing  sequentially  the  moving  parts  (of  an  individual)  relevant  for 
an  action  it  performs  is,  in  principle,  necessary  and  probably  sufficient 
to  represent  and  interpret  the  cognitive  meaning  of  action  sentences. 
The  moving  parts  playing  central  roles  in  order  during  each  action  are 
defined  in  terms  of  the  features  similar  to  those  utilized  in  phonology. 
After  the  notion  "interval"  is  introduced  to  decompose  the  structure  of 
action  sequentially,  Action  Function,  the  total  algorithm  for  specifying 
the  moving  parts  involved  in  each  step  of  an  action,  is  defined  and 
discussed.  The  category  function  is  built  into  this  algorithm  together 
with  the  interval  length  function.  Action  Function  is  a  device  for 
"monitoring"  and  mark  the  parts  of  an  action  as  it  proceeds.  Action 
formula,  another  algorithm  for  action  without  recourse  to  intervals,  is 
also  defined  and  investigated.  Finally,  some  residual  problems  involved 
in  this  framework  are  pointed  out  and  discussed. 

KEYWORDS:  orientation,  categorization,  defining  features, 
intervals,  action  Junction,  action  formula 

1.    HISTORICAL  BACKGROUND 

In  the  semantic  studies  sofarundertaken,  efforts  have  been  made  to 
construct  the  adequate  interpreting  device  based  on  the  semantic 
relations  bom  by  the  main  verb  toits  arguments.  For  this  purpose, 
several  universal  primitives  have  been  sought  to  represent  and 
explain  the  overall  structure  of  propositional  meaning.  LakofPs 
causati  ve  analysis  showed  the  maimer  to  factor  verb  meaning  into 
primitives  and  represent  it  with  lexical  decomposition  [7],  and 
Jackendoff  proposes  the  lexical  structure  in  which  propositional 


meaning  is  translated  in  terms  of  thematic  relations  [5] ,  originally 
proposed  in  Gruber  [4].  With  these,  it  has  become  possible  to 
represent  explicidy  the  semantic  organization  of  the  overall 
state/ event  structure  denoted  by  die  relevant  verb  of  each  sentence. 
Thematic  relations  were  neatly  accommodated  into  the  government 
and  binding  framework  proposed  in  Chomsky  as  theta  roles  [2].  In 
the  causative  analysis,  the  causer  is  made  distinct  from  the  caused 
event,  and  in  the  thematic-relational  analysis,  verb  meaning  is 
categorized  in  terms  of  moving  object,  the  way  of  motion,  the 
moving  manner,  etc.  In  neither  case,  however,  semantic 
"factorization"issequential,  thus  neither analysiscan  represent  the 
temporal  change  in  die  motion  of  the  agent  of  an  action  in  a  step- 
by-step  manner. 

Montague  Grammar  was  successful  in  representing  the  meaning  of 
syntacticconstituentsinpurelydenotationalterms.wliichprocedure 
was  achieved  in  a  compositional  manner  by  using  type  theory  [8]. 
His  model  theory  consists  of  an  ontology,  a  set  of  entities,  and  a 
denotation  assignmentfuncti  on,  thelatterofwhichassignsaunique 
denotation  to  each  constant  in  the  object  language.  In  this  model, 
grammatical  categories  belong  to  functional  types,  thus  the 
denotation  of  transitive  verbs  of  <e,<e,t»  being  represented  as 
({1,0}A)A-  This  extensional  analysis,  however,  cannot  reach  the 
internal  structure  of  verbs.unlesssomedecompositional  approachis 
invoked.  It  did  not  dare  to  decompose  verb  meaning  dynamically, 
solely  translating  it  as  a  semantic  primitive. 
In  this  paper,  I  will  introduce  a  new  semantic  model  in  which 
denotationassignmentcanbeperformedsequentiallyindiecasesof 
action  verbs  in  order  to  represent  each  step  of  their  processes.  I  will 
show  diat  by  postulating  that  each  individual  is  a  sequence  of  sets  of 
place  points  occupied  by  it  at  time-points,  a  more  articulate 
structure  can  be  given  to  die  model. 

2.    CATEGORIZATION  AND  PARTS 

One  of  the  human  cognitive  abilities  is  categorization  of  tilings 
fromUieirappearances.  This  is  possible  because  humans  can  sort 
out  some  individuals  as  a  specific  kind  from  odiers  by  checking  all 
the  bits  of  its  apparent  definition.  Humans  can  also  further 
distinguish  die  individuals  of  die  same  category,  not  oidy  humans 
but  odier  higher  animals,  still  lower  organisms,  even  minerals.  On 
die  odier  hand,  diey  can  identify  die  same  individual,  though  it  is 
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seen  from  different  physical  points  of  view.  Thus,  humans  can 
readily  identify  the  upright  and  upside-down  figures  of  the  same 
individual.  The  point  is  that  despite  such  various  differences, 
individual  or  orientational,  humans  can  do  readily  determine 
membership.  This  indicates  that  criteriafor  categorization  should 
not  be  too  narrow  nor  broad  to  include  into  the  same  kind. 
Whenrecallinga  thing  of  a  certain  category  tomind,wecan  readily 
imagine  the  parts  which  contribute  naturally  to  the  definition  of  its 
kind.  Thus,  in  the  case  of  "horse",  it  has  such  and  such  a  head,  a 
tail,  legs,  etc.  In  that  of  "human  being",  he/she  has  such  and  such 
hands  in  addition  to  his/her  head  and  legs.  It  would,  however,  be  a 
bit  unrealistic  to  think  of  a  desk  or  a  table  without  legs  totally. 
Weassumethatwhen  confronted  with  an  individual  whose  category 
is  to  be  determined,  humans  match  i  t  with  the  stored  "prototypical " 
sets  of  figures  (including  their  parts)  shared  among  the  members  of 
the  relevant  categories.  Each  category  can  be  successf ully  assi  gned 
to  an  individual  only  when  the  tatter's  figure  can  be  exhaustively  (or 
sufficiently  in  most  in  exceptional  cases)  decomposed  into  the 
primitive  parts  which  are  to  be  matched  to  those  of  the  former's 
prototype.  Those  building  parts  of  the  target  indi  vidual ,  in  turn,  can 
be  reconstructed  into  the  meaningful  whole  belonging  to  the 
former. 

3.  CATEGORIZATION  OF  ACTIONS 

Motions  are  spatiotemporal  objects.  Human  beings  can  recognize 
several ,  not  all ,  of  possible  motions  and  verbalize  Uiem  as  actions  if 
they  fulfill  the  definitions  of  the  action  verbs  of  their  own 
languages.  This  does  notmean  that  the  possible  "actions"  excluded 
from  verbal  actions  cannot  belinguistically  expressed.  They  can  be 
expressed  by  building  words  into  phrases.  In  a  purely  physical 
point  of  view,  actions  are  spatiotemporal  continuums.  Each  action 
involves  a  continuous  change  of  appearance  exhibited  by  its  agent 
It  would,  however,  be  incorrect  to  say  that  humans  can  only 
recognize  such  objects  as  actions  if  they  continue  to  observe  them, 
hi  real  perception,  itis  very  often  sufficient  to  recognize  an  acdon 
merely  by  glimpsing  at  the  appearance  of  its  agent.  This  is 
sufficiently  possible  despite  the  fact  that  all  of  the  tokens  of  an 
action  are  distinct  in  not  only  time  and  location  of  occurrence  but 
also  duration,  size,  proportion  etc.  This  is  a  piece  of  evidence  that 
humans  are  capable  of  categorizing  various  tokens  of  amotion  under 
one  category  of  action  in  a  manner  that  does  not  burden  them  with 
so  much  load  for  recognition. 

4.  POSTURES  OF  MOVING  OBJECTS 

To  understand  what  an  individual  is  doing  and  to  express  (a 
part/parts  of)  its  movement  in  terms  of  an  action  sentence  of  a 
natural  language,  we  need  to  capture  not  only  the  position  but  the 
posture  of  the  moving  individual  at  each  time  during  its  action. 


Seen  as  a  moving  physical  object,  however,  an  individual 
performing  an  action  is  undergoing  a  sequence  of  changes  in  the 
position  of  its  volume.  To  decide  which  category  of  acdon  the 
relevant  movement  belongs  to,  we  have  to  deliberately  distinguish 
the  changes  in  its  posture  from  the  concomitant  transition  of  its 
whole  volume. 

If  we  focussed  solely  upon  the  course  of  its  movement,  we  could 
easily  detect  its  trace  as  a  path  with  some  thickness  occupied  by  the 
whole  volume  of  the  moving  individual.  But  this  is  not  enough. 
To  accumulate  sufficient  information  for  categorization  of  action, 
we  need  to  keep  calculating  the  posture  at  each  step  of  a  mover  in 
terms  of  the  parts  involved  in  the  relevant  action  during  it 
There  are  two  possible  approaches  at  present  available  to  realize 
this.  The  first  one  is  to  specify  totally  sequentially  the  sets  of  parts 
invol  ved  in  the  relevant  action  at  each  timepoint  in  terms  of  a  place- 
point  set.  The  second  one  is  to  show  the  inclusion  relations  of 
subactions  which  are  sequentially  described  in  terms  of  the  parts 
involved  in  the  total  action. 

5.  PARTS  IN  TERMS  OF  FEATURES 

It  is  useful  toconsiderhow  speech  sounds  are  recognized  by  humans 
and  described  by  linguists.  In  each  particular  language,  despite  die 
mdividualdifierencemthestrength,length,pitches,quality,etc.of 
the  "same"  phone  produced  by  speakers,  they  share  some  average 
/referential  sound  of  psychological  reality  widun  dieir  speech 
community.  Tocapture  Uiis.in  phonology,  each  family  of  phones 
that  are  approximately  equal  in  articulatory  point  is  selected  from 
die  inventory  of  all  die  phones  producible  by  humans  and  is  given  a 
phonedc  value,  whichis  givenalabel  of  "phoneme".  Furthermore, 
ingeneradvephonology.eachphonecorrespondingtoonephoneme 
isdefinedasabundleofdisdncdvefeatures.whichspecifyamanner 
of  ardculadon  mainly  based  upon  an  articulatory  point  [3]. 
This  is  applicable  to  the  definidon  of  semandc  representation  of 
acdon  verbs,  since  die  latter  is  definable  in  terms  of  "parts  involved 
ini  ts  "consd  tuent "  movements  ".  The  difference  be  tween  diese  two 
cases  is  that  the  latter  involves  "acdon",  the  spatiotemporal  object 
extending  over  dme  in  such  a  procedural  manner  that  requires  a 
sequential  definidon  of  parts  involved  in  the  relevant  acdon. 
Aldiough  the  disdncd  ve  features  for  one  sound  mainly  involve  no 
factors  related  to  dme  sequence  since  it  is  defined  in  terms  of  a 
simultaneous  complex  of  articulatory  parts,  they  provide  us  a 
promising  hint  toward  die  dynamic  representadon  of  acdon  verbs. 
We  thus  assume  diat  each  action  consists  of  a  sequence  of 
subacdons  (or  sets  of  subacdons)  definable  in  terms  of  parts 
involved  in 
diem. 

6.  SPATIOTEMPORAL  INDIVIDUALS 
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By  extending  the  index  introduced  in  Montague  and  extended  in 
Barwise  and  Perry,  which  consists  of  a  possible  world  and  an 
interval  of  time  [1],  we  propose  a  model  where  the  spatiotemporal 
structure  of  an  action  is  described  in  terms  of  a  sequential  set  of  the 
features  denoting  the  parts  involved  in  that  action.  To  enable  this, 
the  location  of  each  individual  is  established  based  on  placepoint- 
timepoint  pairs.  Thus,  each  individual's  location  is  represented  as 
the  set  of  placepoints  occupied  by  it  at  each  timepoint:  Base 
Fti/icfio/j.takingindi  vidual  and  timepointas  arguments, re  turns  the 
volume  occupied  by  the  former  at  the  latter. 

7.  BASE  FUNCTION 

In  order  to  specify  (sets  of)  placepoints  on  individuals,  it  is 
necessary  to  be  able  to  calculate  their  location  at  each  timepoint 
Let  each  world  at  a  timepoint  be  full  of  placepoints,  with  the  same 
world  continuing  toexist  along  an  axis  of  timepoin  ts.  At  each  time 
of  its  existence,  each  individual  occupies  a  set  of  placepoints,  which 
will  henceforth  call  its  "base".  To  locate  individuals' bases  at  each 
time,  we  define  Base  Function  as: 

/?(*,  y)=z,  where x  SIND>y  ET,  z  <ZV. 

Witliindividual  and  timepoint  as  its  arguments,  jB  returns  the  set  of 
placepoints  (i.e.  base)  occupied  by  the  former  at  the  latter. 

8.  ORIENTATION  FUNCTION 

Weshowedtliataspecificoccurrenceofactioncanberepresentedby 
means  of  Orientation  Function  [9].  Taking  a  category,  a  set  of 
placepoints  and  a  placepoint  as  arguments,  this  returns  as  value  a 
content  point,  whose  ordinal  number  indicates  the  orderin  which  to 
count  the  input  placepoint  of  the  base: 

co(x,  y,  z)=w, 

wherexECAT,  y  €2*"-  <f>,  z  €P,  w  €CU{e  }. 

The  resulting  order  is  the  criterion  for  orientation  by  which  the 
relevant  individual  is  cognized  by  die  observer  as  falling  under  the 
input  category.  It  was  shown  that  an  instance  of  walking  can  be 
described  in  terms  of  a  set  of  sequences  of  placepoints  occupied  by 
the  content  points  of  its  agent  individual. 

9.  THEORETICAL  BACKGROUND 

Accommodating  die  notion  of  "feature",  introduced  in  Jacobson, 
Fant  and  Halle  [6]  and  Chomsky  and  Halle,  we  proposed  elsewhere 
diat  stati  ve  categories  be  defined  by  sets  of  features  and  diat  relevant 
(partsof)individualsbelocatedby  Category  Function .  This,  taking 
a  subset  of  placepoints  and  a  feature,  returns  the  placepoint  set 
denoted  by  diat  feature: 


y  >Z,  where  x  G2P-{4>yy  GDj,  z&r  orz=4>. 

This  function  enabled  us  to  set  up  Fulfillment  Condition,  which 
specifies  an  individual's  membershipin  the  set  denoted  by  a  category 
in  the  following  way:  if  and  only  if  the  union  of  the  subsets  of  the 
base  of  an  individual  which  correspond  toall  of  thedefiningfeatures 
of  a  static  category  is  included  into  that  base  at  a  time  point,  the 
individual  belongs  to  the  relevant  category  at  that  time.  Taking  the 
category  BIRD  as  an  example,  its  prototype  has  a  beak,  wings, 
stick-like  legs,  etc.  The  fulfillment  condition  for  BIRD  is: 

j 

BlRD(inda)  at  tb  &  \Jbird(j3(inda,tb),  a\)  £  j8(inda,tb), 
i=l 

where  Dbirj=^d\.beak,  d^.wing,  d^.stick-like  leg,  ...}, 

saying  that  at  timepoint  b,  individual  a  belongs  to  die  category 
"bird"  if  and  only  if  all  of  the  defining  features  of  dus  category, 
among  which  are  "beak",  "wing",  "stick-leg",  never  fails  to  find 
their  location  on  the  base  of  a  at  b. 

To  be  able  to  idendf  y  the  posture  of  a  moving  individual ,  we  need 
some  standard  with  which  to  measure  the  moving  parts.  For  this 
we  invoke  the  reverse  partial  function  of  Orientation  Function, 
capturing  the  point-to-point  correspondence  between  die  basesof  an 
individual  at  two  different  timepoints. 

10.  PROBLEMS 

The  first  problem  addressed  here  is  how  to  represent  the  continuous 
change  in  the  shape  of  the  moving  agent  as  i  t  performs  an  action.  It 
was  shown  by  us  that  this  is  done  by  decomposing  each  action  into 
subactions  definable  in  terms  of  the  parts  of  its  agent  [10]. 
The  second  problem  is  how  to  identif y  various  tokens  of  one  action 
as  belonging  to  the  same  category.  Two  candidates  for  tlii  s  task  are 
possible:  to  define  each  action  as  a  sequence  of  sets  of  features, 
which  specifies  die  parts  in  motion  in  order  and  as  a  complex  of 
subacdons  by  specifying  a  moving  part  individually.  For  the  first 
algori  dun,  some  device  lias  to  be  created  to  focus  on  the  order  of  the 
parts  involved  in  a  whole  motion,  neglecting  the  real  time  lengths 
between  the  initial  and  terminal  endpoints  of  the  subacdons 
involved  in  instances  of  the  same  type  of  action.  To  enable  diis. 
Interval  Length  Function  is  invoked  to  "neutralize"  the  individual 
difference  in  the  relative  lengths  of  the  intervals  which  switch  the 
parts  in  motion.  No  interval  function  nor  i  ts  equivalent  needed  for 
die  secondalgoritlim, where  theorderofsubactionsinvolvedineach 
action  is  sufficient  for  dieir  sequential  specification.  Furthermore  it 
is  necessary  to  compare  the  two  devices  and  set  up  a  relation 
between  these.  We  will  discuss  this  problem  and  give  it  a  tentative 
answer  elsewhere. 
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11.  INTERVAL  LENGTH  FUNCTION 

We  must  now  see  an  al  gori  thm  which,  at  each  interval  on  the  whole 
time  axis  along  which  an  action  takes  place,  specifies  a  set  of 
defining  features  and  locates  their  denotation  on  the  placepoint  set 
occupied  by  its  agent,  i.e.  its  base.  It  is  important  to  note,  here, 
that  the  'timing'  of  changing  intervals  is  not  specified,  in  view  of 
the  fact  that  agents  of  the  same  action  usually  differ  in  the  lengths 
of  compatible  intervals.  Thisremindsus  the  description  of  speech 
sounds  in  terms  of  phonemes,  which  we  have  seen  before.  Let  us 
assume  that  Interval  Length  Function  fmiv\(x)=y  takes  the  integer  x 
of  the  or-th  interval  involved  in  an  action  and  returns  its  length  y  in 
tenns  of  the  number  of  the  unit  intervals  constituting  that 
interval: 

X-th  interval 

1  2  3  4  ... 

I  I  1  1   1  1  1  1  I  1  1  I  I  I  1  I  1  •  '  ' 

'i  h  hUHhh  H  Mio'n'n'is'u'isWn  ... 

2  3  6  4  .... 

Length  y  of  interval  x: 
where/i„ft,Kl)=2,  /i,„v/(2)=3,    /i/,/vK3)=6,  /mw(4)=4  

If  we  represent  each  interval  involved  in  an  action  as  an  ordered  pair 
of  its  endpoints,  the  endpoints  of  /-th  interval  <tq,tt>  of  the  whole 
interval  of  an  action  starting  at  /j  are  calculated  by  using  Interval 
Function  as  follows: 

<tqJf>u  such  as: 

«=1:  «Fj,r=fintvl(\)+j>, 

u-l  u 
u^2:  <q=  Z/i,,/v/(k)+j.  r=  Tfin^k)+j  >. 

The  calculation  of  the  endpoints  of  each  interval  is  based  upon  the 
fact  diat  the  number  of  die  right  endpoint  is  equal  to  that  of  the 
initial  point  plus  Uiat  of  the  left  endpoint. 
In  this  sense,  Interval  Length  Function  plays  an  essential  role  in 
grouping  various  tokens  of  the  same  action  together  into  one  type 
of  action,  which  has  die  invariant  denotation  of  an  action  verb.  By 
all  owing  each  interval  to  range  freely,  this  function  helps  tocapture 
an  observable  fact  tiiat  die  lengUis  of  the  durations  of  die  subactions 
constituting  a  whole  action  are  usually  different  in  its  real 
occurrences. 

12.  ACTION  FUNCTION 

Watching  an  action  closely  to  its  end,  we  can  see  diat  its  agent 
successively  changes  its  posture,  during  which  die  location  in 
motion  shifts  from  one  sets  of  parts  to  anodier  to  complete  an 


action.  Thus,  each  action  is  defined  as  a  sequence  of  sets  of  moving 
parts.  To  take  the  action  of  "nodding"  as  an  example:  first  the  head 
gets  bent  forward,  second  the  neck  moves  forward  to  some  extent 
while  the  head  is  continuing  the  same  movement,  third  the 
movement  affects  the  shoulder  with  the  other  two  parts  continuing 
the  same,  fourth  all  these  parts  halt  for  a  moment,  and  they  resume 
this  process  backwards.  The  sequence  of  moving  parts  taking  parts 
in"nodding"canberepresentedas:  nod:  D„=<{head}, {head,  neck}, 
{liead,  neck,  slwulder},  ...  >. 

The  sequential  specification  of  moving  parts  enables  us  to  define 
Action  Function .  At  each  interval  of  the  whole  duration  of  an 
action,  thi  s  al  gori  thm  specifies  a  set  of  defining  features  and  locates 
their  denotation  on  the  placepoint  set  occupied  by  its  agent: 

<*i<tj.  <tq>  tf>u,  KiX0(x,  y),  D\u))=z,j-£q+l-£h-£r, 
where*  EIND,  y  =theT,  z  €2>>-{4>}, 
u=l:  <q=j,  T=fintvi(l)+j>, 
u-l  u 
u^2:  <q=  Zfinniiky+j,  r=  ?-fintvi(k)+j  >. 
k=l  fc=l 

If  die  base  function,  the  first  argument  of  die  extended  category 
function,  which  is  in  turn  the  third  argument  of  an  action  function, 
takes  an  individual  and  die  starting  timepoint  tj  and  calculates  its 
base,  then  the  action  function  picks  up  the  first  interval(,  which 
contains  diat  timepoint  and  is  of  allowably  arbitrary  lengdi,)  from 

within  the  overall  interval  of  the  action  a;,  starting  at  ti  andretum 
as  its  value  a  subset  of  the  base  which  corresponds  to  die  first  set  of 
features  of  the  defining  sequence  for  the  action  a,-.  Similar 
calculation  is  iterated  until  the  relevant  timepoint  exhausts  the  first 
interval.  Switching  to  the  second  interval,  the  action  function 
returns  a  subset  of  die  base  f or  the  second  feature  set  of  the  defining 
set  and  each  timepoi  nt  of  the  second  interval ,  si  mil  ar  process  being 
iterated  until  the  last  interval  is  reached.  Thus,  the  category 
function  embedded  in  this  function  plays  a  central  role  in 
representing  the  meaning  of  each  action  by  displaying  sequentially 
the  denotations  of  die  moving  parts  of  its  agent. 

13.    CATEGORY  FULFILLMENT 

Using  Action  Function,  we  set  up  Dynamic  Category  Fulfillment 
Co/id/7/o/i.,aninteqirctationmecliaiiismwliichchecksifacandidate 
situation  falls  under  the  relevant  category  of  action.  If,  by 
decomposing  an  action  into  several  intervals,  we  can  find  that  the 
parts  of  its  agent  which  have  changed  their  positions  during  each  i- 
diinterval  contain  the  parts  specified  by  die  value  returned  from  an 
action  function  widi  diat  /  -di  set  of  feature  as  its  second  argument, 
dien  we  can  decide  that  die  movement  belongs  to  die  category 
denoted  by  diat  function: 
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{yl<*,y>G(  {co(IND,0(inda,tk+l))y-l 

-aeo(IND,/3(inda,tk)))'-i 

2  a,{tjt<tq,t^u,  ic,£0Qnda,tk+x),DuJ). 

This  is  a  recognition  device  which,  given  a  complex  of  actions 
performedbyanindi  vidual  judges  whetherthecomplexcontains  the 
specified  action.  Then,  how  are  we  to  represent  actions  in  these 
terms? 

14.    ACTION  FORMULA 

In  contrast  to  the  sequential  specification  of  the  subsets  of  defining 
features  denoting  the  parts  involved  in  motion,  we  can  set  up 
another  algorithm,  Action  Formula.  This  specifies  the  defining 
features  as  they  are  individually  without  reorganizing  them  into 
their  subsets: 

ACTlONi         \tvt^  Vl(flL) 

=<<^!«,^Vt-l>l'  <F"jq+l>P'jq+2>2<  ■■■  <F"p'l'P'jr>r-q>- 

The  domain  of  this  function  is  limited  to  the  set  of  the  defining 
features,  which  denotes  the  moving  parts  involved  in  the  relevant 
action.  If  any  of  the  features  for  that  action  is  taken  as  its  value,  the 
function  returns  a  sequence  of  pairs  of  placepoint  sets  occupied  by 
the  agent,  over  the  maximal  interval  during  which  the  part  denoted 
by  that  feature  actually  is  continuing  some  motion.  The  set- 
theoretical  definition  of  this  function  is  done  by  applying 
recursively  the  adjacent  movement  function: 

ACTlONi  indJ'-tW^ 

=deji  <*<  <  <y>Z>\Uwda  <V<M>(y)=Z  > 

t_l 

bc=(  co  '(INDIVIDUAL,  j3(indaJi)))(y)t  *  <E  D* 
y  €  2  8( -{  4>  },qj^k-£rr  1  ID.-I  >  }. 

The  returned  value  shows  the  series  of  changes  in  the  figure  of 
a  moving  part  over  the  interval  during  which  that  subaction  is 
ongoing,  in  terms  of  a  sequence  of  pairs  of  place -point  sets  at 
adjacent  timepoints.  The  relative  order  of  the  subactions  of  an 
action  is  defined  by  the  endpoints  of  die  inner  interval. 
Let  us,  here,  define  Set  Trace  Function,  which,  with  its  fixed 
individual  and  time  interval,  takes  any  subset  of  content  points  for 
mecategoryrNDIVIDUAL.retunisasequenceofpairsdenotingthe 
transition,  over  diat  interval,  of  the  part  denoted  by  diat  subset  of 
content  points: 

=«P'q>P'q+l>l>  <F"q+\'P'q+2>2>  ■■■        r-\fr>r-q>  ■ 


The  transition  shows  the  all  possible  changes  in  the  figure  of  the 
relevant  part. 

15.  ACTION  CONTAINMENT 

To  check  if  an  individual  has  performed  an  action  over  an  interval , 
we  have  to  find  that  for  every  sequence  returned  from  the  relevant 
action  formula  with  a  feature  as  its  argument,  there  is  some 
sequence  which  is  returned  from  the  relevant  set  trace  function  with 
its  argument  corresponding  to  the  part  denoted  by  the  feature  and 
which  includes  the  former  sequence  as  a  subset  For  a  specific 
individual  over  a  specific  interval,  the  following  holds: 

VydACTIONt         \<k  V(x)=yx 
&yj£y2 

&  COXIND,  0(inda,tk),  K £0(1,^),  x))=z  ]  ], 
where  ^k^rj, 

This  denotes  how  the  transition  of  each  part  of  the  agent,  over  the 
interval  of  motion,  is  included  into  its  transition  over  die  whole 
interval.  Simplifying  its  notation  by  deleting  the  details,  we  can 
express  the  universal  quantification  as  an  inclusion  relation.  See  the 
following: 

For  every  dGDit  ACTlONi         K*h  t^^ST^a  K'HC), 
where  co '(IND,  /3{indaJk),  Kt{/3(inda,tk),d))=C'. 

16.  CONCLUDING  REMARKS 

The  above  setting  enables  us  to  interpret  sentences  of  action  in  two 
compositional  ways.  Limiting  to  a  simple  intransitive  case, 
functions  and  conditions  are  to  be  applied  roughly  as  follows: 

•)   a-  ts  [np  agent's  base]    [vp  [v  Action  Function]  ]  ] 

b.  [s  [np  agent's  base]    [yp    Action  Function  ]  ] 

c.  [s  Dynamic  Category  Fulfillment  Condition  ] 

ii)  a-  ts  [np  Agent's  base]  [yp  [v  Action  Formula]  ]  ] 
b-  [s  [np  Agent's  base]  [yp  Action  Formula  ]  ] 
c.  [§  Action  Containment  ] 

We  have  shown  a  possibility  that  human  recognition  process  not 
only  of  static  categories  but  also  of  dynamic  ones  like  action  verbs 
isperfonnedbysomeplace-locatingmechanismsignificantlybased 
upon  vision  as  its  input.  If  this  mechanism  is  to  be  fully 
understood,  then  it  will  provide  some  evidence  that  the  human 
cognitive  ability  to  categorize  is  actually  driven  or  supported  by  a 
comparatively  simple  physical  system. 
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We  have  seen  two  alternatives  based  upon  the  categorization 
algorithm  in  which  several  functions  detect  sequentially  the  parts 
involved  in  ongoing  actions  by  specifying  them  in  terms  of  place- 
point  sets.  Does  the  physical  cognitive  system,  then,  favor  one 
over  the  other?  We  have  no  biologically  oriented  data  to  choose 
between  those. 

From  a  linguistically  semantic  viewpoint,  however,  there  is  a 
possibility  that  the  first  one  provides  a  model  for  interpretation  and 
the  second  forms  a  representation  which  is  at  least  partially 
compositionally  calculated  from  the  syntax.  We  will  discuss  this 
point  elsewhere. 

We  have  still  more  questions  to  answer  and  problems  to  solve. 
Some  of  them  are  the  following:  how  to  semantically  represent 
transitive  cases,  for  which  we  have  been  developing  an  algorithm, 
and  what  if  an  agent  performs  an  action  ona  independently  moving 
object?,  for  which  we  have  to  set  up  an  embedding  device  where 
Orientation  Function  applies  multiply. 
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Abstract 

This  paper  presents  a  new  algorithm  for  solving  the 
forward  kinematics  of  the  Stewart  Platform  by  us- 
ing Focusing  Attention  and  Searching.  Previous  al- 
gorithms solve  this  problem  by  numerically  finding  the 
solution  for  a  16th  degree  polynomial.  We  will  show 
that  it  is  possible  to  drastically  reduce  the  search  space 
by  representing  the  system  with  three  angles  around 
the  circles  formed  by  the  given  lengths.  Then,  we 
search  in  a  nonlinear  equation  with  one  unknown  and 
two  roots.  Reduction  of  computational  complexity  al- 
lows us  to  use  the  feed-forward  kinematics  transform 
in  a  broader  scope  of  practical  cases. 

I.  Introduction 

In  this  paper,  we  demonstrate  that  no  matter  how 
complex  the  problem  is  local  focusing  of  attention  and 
searching  can  be  used  to  substantially  simplify  the  ef- 
fort. This  paradigm  of  dealing  with  problems  (out- 
lined in  [1,  2]  can  be  implemented  in  many  cases.  The 
Stewart  Platform  was  introduced  in  [3]  and  further 
developed  in  [4,  5].  It  is  a  closed  kinematic  system 
with  parallel  links  which  is  consider  to  be  far  more 
rigid  than  the  serial  counterparts  of  the  same  size  and 
weight.  Its  force-output-to-manipulator-ratio  is  gener- 
ally an  order  of  magnitude  bigger  than  most  industrial 
robots  [6].  This  same  closed  kinematic  structure  that 
gives  its  rigidity  also  complicates  the  solution  of  the 
forward  kinematics  in  such  a  way  that  no  close  loop 
solution  for  this  problem  has  been  found.  Some  au- 
thors go  as  far  as  to  say  that  "due  to  the  kinematic 
nature  of  this  procedure,  it  is  impossible  to  compute 
the  kinematics  solution  on  line"  [6] . 

To  solve  this  problem,  some  researchers  have  trans- 
formed it  into  solving  30  non-algebraic  simultaneous 
equations  in  [7].  More  recent  results  include  the  solu- 
tion of  24th  [8],  16th  [9,  10],  or  12th  [11]  order  polyno- 
mial in  a  single  variable.  Finally,  in  [6]  an  algorithm 
that  involves  three  non-linear  simultaneous  equations 
with  three  unknowns.  [12]  shows  that  some  of  these 
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ward and  inverse  kinematics  for  control  purposes 


algorithms  can  be  solved  in  parallel,  thus  reducing  the 
time  of  computation. 

Our  algorithm  has  a  number  of  advantages  in  com- 
parison with  the  existing  solutions.  It  uses  a  nonlinear 
equation  with  one  unknown  and  two  roots.  It  is  not 
necessary  to  solve  all  16  mathematically  possible  solu- 
tions [11]  for  the  platform.  One  should  be  sufficient, 
considering  that  the  structure  of  the  platform  is  known 
and  that  we  make  sure  that  we  are  inside  of  some  re- 
gions of  existence  that  we  will  later  define 

The  results  can  be  used  for  real-time  control  appli- 
cation in  a  variety  of  settings.  Control  structures  are 
presently  being  developed. 
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II.  Forward  Kinematics  and  its  Need  for 
Control  and  Measurement 

Forward  and  inverse  kinematics  are  terms  to  de- 
scribe mapping  from  the  space  of  inputs  to  the  space  of 
outputs  of  a  non-dynamic  mechanical  system  as  shown 
in  Figure  1.  Figure  2  shows  a  typical  application  of 
the  forward  and  inverse  kinematics  algorithm.  A  tra- 
jectory such  as  a  milling  machine  or  a  welding  system 
is  to  be  followed.  The  controller,  using  the  inverse 
kinematics  algorithm  (and  calculating  desired  changes 
in  position),  computes  the  control  signal  given  to  the 
actuators.  At  the  output  of  the  platform  we  measure 
the  lengths  of  the  edges  controlled  by  the  actuators. 
Then,  the  forward  kinematics  algorithm  transforms 
those  length  into  positions.  The  position  signal  is  com- 
pared to  the  assigned  position  and  added  to  the  con- 
troller's input.  Because  of  its  closed  kinematic  chain, 
in  the  Stewart  platform,  the  rigidity  of  the  end  effector 
allows  a  reasonably  accurate  feed-forward  assignment. 
Due  to  the  complexity  of  older  forward  kinematics  al- 
gorithm, most  Stewart  platforms  are  controlled  with- 
out using  feedback.  The  proposed  algorithm  will  allow 
the  calculation  of  the  forward  kinematics  in  real  time. 
This,  opens  an  opportunity  to  increase  the  accuracy 
of  the  control  algorithms  and  broadening  the  current 
uses  of  the  platform. 

There  are  other  applications  where  the  solution  of 
forward  kinematics  is  mandatory.  These  are  the  cases 
when  the  Stewart  platform  is  used  as  a  component  of 
a  measuring  device.  These  cases  include: 

•  camera  mapping  where  a  camera  is  mounted  on 
top  of  a  Stewart  platform.  For  example,  we  let  the 
camera  roam  until  it  finds  a  target,  and  then  we 
ask  where  is  the  camera  pointed.  It  is  also  usefull 
when  we  have  a  fast  control  system  to  stabilize 
the  camera  and  then  have  the  forward  kinematics 
algorithm  tell  where  the  camera  is  pointed  (See 
Figure  3). 

•  positioning  devices  (6  degrees  of  freedom  joy- 
sticks), and 

•  inspection  systems  mounted  in  Stewart  platforms 
that  could  be  used  to  touch  the  part  to  inspect  it. 


We  were  unable  to  find  any  examples  where  the  Stew- 
art platform  was  used  for  any  of  these  purposes  in  in- 
dustry, although  the  uses  of  parallel  manipulators  for 
sensing  are  very  common  in  nature  (i.e.  our  necks  and 
eyes).  We  can  conjecture  that  these  structural  solu- 
tions did  not  emerge  in  industry  because  the  previous 
forward  kinematics  algorithms  were  not  fast  enough  to 
allow  for  real  time  operation  of  these  devices. 

In  [13]  and  [14]  new  sensors  were  added  to  their  par- 
allel manipulators  in  order  to  find  the  forward  kine- 
matics in  a  faster  way.  We  expect  our  algorithm  to 
decrease  the  manufacturing  cost  of  these  platforms  by 
solving  the  problem  without  the  need  to  add  extra  sen- 
sors. 

In  this  paper,  we  will  first  show  how  this  algorithm 
is  used  to  solve  the  forward  kinematics  of  the  3-3  Stew- 
art manipulator  (three  joints  in  the  fixed  platform  and 
three  joints  in  the  moving  platform).  See  Figure  4. 
Later  we  show  how  this  algorithm  can  be  extended  for 
other  parallel  platforms. 

In  Figure  5,  the  RST  triangle  is  the  fixed  platform 
and  the  ABC  triangle  is  the  moving  platform.  The 
length  of  the  six  edges  controlled  by  the  actuators  of 
the  platform  £i, . . .  ,£6,  (Figure  4).  The  position  of 
R,  5,  and  T  are  constant  and  given.  If  we  draw  the 
geometrical  place  of  the  point  A  (Figure  5),  it  will  be 
a  circle  (o,4)  since  the  segments  RS,  SA  =  £2  and 
RA  —  C\  are  given.  This  circle  can  be  created  by 
rotating  the  triangle  ARS  around  RS.  In  the  same 
manner,  oB  and  oC  are  created  by  rotating  B  around 
ST,  and  C  around  RT  respectively.  For  simplicity,  let 
AB  =  AC  =  BC  =  b  and  RS  =  RT  ^  ST  =  a. 
By  looking  at  Figure  5,  it  is  possible  to  see  that  the 
forward  kinematics  algorithm  must  find  a  point  in 
each  circle  such  that  the  distance  among  these  points 
matches  the  dimension  of  the  edge  of  the  moving  plat- 
form. 

III.  The  Algorithm  of  Forward  Kinematics 

Therefore,  the  algorithm  in  the  general  form  works 
in  the  following  manner  (Figure  6): 

1 .  The  subsequent  procedures  start  with  finding  the 
range  of  the  real  solutions.  In  order  to  explain 
how  this  is  done,  we  will  outline  the  algorithm. 

2.  Pick  a  point  A'  in  oA  (See  Figure  7).  A'  is  com- 
pletely defined  by  oA  and  6a  the  polar  angle  in- 
side that  circle.  For  now,  A'  is  an  arbitrary  point. 
Later,  we  show  that  it  is  possible  to  analytically 
find  a  region  where  the  solutions  must  exist. 

3.  Analytically  find  a  point  B'  in  oB  such  that 
A'B'  —  b.  We  will  see  that  there  are  two  an- 
alytical solutions.  Let's  name  them  B'+  and  B'_ 
(for  positive  and  negative  slope  of  the  A'B'  versus 
6a  curve,  top-left  of  Figure  9.  For  now,  assume 
that  these  are  arbitrary  labels). 
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B 


Figure  4:  3-3  Stewart  Platform  showing  the  actuators 


Figure  5:  3-3  Stewart  Platform  showing  oA,  oB,  and 

oC 


4.  Analytically  find  a  point  C  in  oB  such  that 
A'C  =  b.  Again,  there  are  two  analytical  solu- 
tions. Let's  name  them  C'+  and  C'_  (for  positive 


and  negative  slope  of  the  A'C  versus  9  b  curve). 
5.  At  this  point,  we  found  five  points  such  that 
~WB\  =  A^W  =  ~AC\.  =  A^Cl  =  b,  but 
we  are  still  not  sure  that  B'+C+lb,  B'+C'_lb, 
B'_C+Lb,  and  B'_C'_Lb.  We  can  create  four  func- 
tions: Ji(0a)  (length  of  B'+C'+  as  a  function  of 
9  a),  J2{9a)  (length  of  B'+C'_  as  a  function  of 
9  a),  /a (^a)  (length  of  B'_C'+  as  a  function  of  9A) 
,  and  fi{0A)  (length  of  B'_C'_  as  a  function  of 
#,4).  Examples  of  these  functions  are  shown  in 
Figure  8.  The  algorithm  then  individually  uses 
Newton-Raphson  search  procedure  and  finds  the 
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Figure  6:  A  general  view  of  the  forward  kinematics 
algorithm 
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Figure  7:  Searching  for  the  correct  b  lengths 


value  for  9a  in  each  of  the  functions  until  B'C  =  b 
to  the  desired  accuracy.  These  functions  are  peri- 
odic, and  they  will  have  at  most  two  roots  in  each 
half  period.  Thus  yielding  8  forward  kinematic 
solutions^-BC-)-.)-)-  being  the  positive  slope  solu- 
tion of  the  /i(#a),  ABC+-\  being  the  negative 

edge  of  the  same  functions.  Similarly,  ABC^  |_, 

ABC+__,  ABC-++,  ABC-+-,  ABC  +,  and 

ABC  are  the  other  six  solutions,  to  a  total 

of  eight  solutions  (for  the  half  period  of  9a  see 
Figure  9). 
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Figure  8:  h  (0A),  h  (0  a  ) ,  h  ) ,  and  f4  (9A )  for  a  b  =  5 
example 


It  is  unnecessary  to  search  all  four  functions  unless 
we  are  interested  in  finding  all  8  (top)  solutions.  The 
reason  is  that  if  we  know  how  our  platform  is  currently 

configured  (i.e.  in  the  ABC+  ),  it  is  easy  to  prove 

that  the  next  solution  will  also  be  a  ABC+  provided 

that  we  have  not  gone  through  maxima  which  are  ac- 
tually a  singularity  point.  These  singularity  points  can 
be  easily  derived  from  the  expression  for  the  A'B'  and 
A'C.  For  each  concrete  configuration,  a  search  strat- 
egy can  be  assigned  in  such  a  way  that  the  complexity 
of  computation  is  minimized. 

IV.  The  Equations  of  the  System 

We  can  write  the  equations  of  the  system  in  the  fol- 
lowing way: 


Ci 

=  V  (xa 

-  xR)2  +  (yA 

-  yR)2  +  (za 

-Zr)2 

(1) 

Ci 

=  \J  (XA 

-  XS)2  + (yA 

-ys)2  +  (za 

-ZS)2 

(2) 

c3 

=  \/(xc 

-  xr)2  +  (yc 

-  vr)2  +  (zc 

-ZR)2 

(3) 

c4 

=  \J(xc 

-  xt)2  +  {yc 

-  Vt)2  +  (zc 

-zt)2 

(4) 

c5 

=  a/  (xb 

-  xs)2  +  (yB 

-  ys)2  +  (zb 

-zs)2 

(5) 

c6 

=  V  (XB 

-  xT)2  +  (yB 

-  VT)2  +  (ZB 

-zT)2 

(6) 

are  the  Euclidean  equations  between  the  three  points 
that  define  the  lower  platform  R,  S,  and  T  and  the 
points  of  the  upper  moving  platform  A,  B,  and  C.  We 
can  also  write  the  Euclidean  equations  of  the  top  plat- 
form. Here,  we  assume  that  it  is  an  equilateral  trian- 
gle, but  we  are  not  making  any  simplifications  based 
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Figure  9:  8  solutions  in  the  Stewart  Platform 


on  this  assumption. 

b2  =  (xA  -  xB)2  +  (yA  -  yB)2  +  (zA  -  zb)' 


(7) 
(8) 
(9) 


b2  -  (xA  -  xc)2  +  (yA  -  yc)2  +  (zA  -  zc)2 

b2  =  (xB  -  xc)2  +  (yB  -  yc)2  +  (zb  -  zc)2 

where  b  is  the  length  of  the  equilateral  triangle  of  the 
top  platform.  We  create  a  new  reference  coordinate 
system  on  S  (Figure  10)  to  simplify  the  expressions  as 
follow: 


XR 

=  a 

yR 

=  0 

ZR 

=  0 

(10) 

xs 

=  0 

ys 

=  0 

ZS 

=  0; 

(11) 

XT 

a 

~  2 

yr 

\/3a 
2 

ZT 

=  0 

(12) 

As  shown  in  Figure  10,  we  define: 


PA  = 

(ci 

-  r2 

+  a2) 

rA  - 

2a 

PB  = 

(ci 

-cl 

+  a2) 

rB  = 

2a 

PC  - 

(Ci 

r2 

2a 

+  a2) 

rc  = 

v^FTI  (is) 


By  looking  at  Figure  10  and  by  using  definitions  (13) 
through  (15),  we  can  easily  write  a  polar  parametric 
equation  of  oA  in  the  following  manner, 


xa  =  a  —  pa 
yA  =  rA  cos  6 a 
za  =  ta  sin  9 a 


(16) 
(17) 
(18) 
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Figure  10:  New  coordinate  system  and  some  defini- 
tions 


on 


in  a  similar  way,  after  some  trigonometric  transforma- 
tions the  equations  for  oB  and  oC: 


XB 


VB 


y/3rs  cos  9b  +  Pb 
2 

—tb  cos  9b  +  V3pb 


zb  —  tb  sin  9b 

y/Src  cos(9c)  -  a  +pc 


xc  =  a  — 


yc  - 


—rc  cos(#c)  +  \/3(a  —  pc) 


zc  =  tc  sin  9c 


(19) 

(20) 

(21) 
(22) 

(23) 

(24) 
(25) 


Let's  assume  that  we  have  chosen  a  9  a  in  oA  using 
equations  (16)  through  (18)  which  uniquely  defines  a 
point  A  in  oA  We  are  interested  in  finding  the  points 
in  oB  that  are  a  distance  b  away  from  A.  We  will 
present  two  methods  for  accomplishing  this. 

Two  methods  where  tested  (both  with  positive  re- 
sults: 

Method  1:  Solving  the  2nd  order  polynomial 
and 

Method  2:  Intersecting  a  sphere  and  a  circle 

The  intersection  between  the  sphere  and  oB  as 
shown  in  Figure  11.  Note  that  the  only  way  for  a 
platform  to  go  from  9b+  to  9b_  is  that: 

•  the  platform  has  gone  through  a  singularity  point 
{9b+  =  These  points  are  at  the  maximum 
of  the  curves  in  Figure  8  or  9 

•  one  of  the  cables  (or  more)  has  gone  slack  if  the 
Stewart  platform  has  flexible  members 

•  an  actuator  has  broken  if  the  platform  has  rigid 
members. 

It  is  only  necessary  to  calculate  one  $b  and  one  9c 
for  a  real  platform.  To  start  our  search  technique,  we 


Figure  11:  A  new  coordinate  system 

calculate  the  distance  between  B  and  C  which  it  is 
given  by  (9). 

V.  The  Search  Technique 

Previously,  we  presented  an  expression  that  given  9a 
will  give  the  distance  B'C\  or  /(#a)-  We  need  to  find 
9a  such  that  /(9a)  =  b.  There  are  several  searching 
techniques  that  can  be  used.  The  search  problem  has 
the  following  features: 

1 .  the  region  of  search  is  known 

2.  there  are  no  more  than  two  solutions 

3.  there  is  only  one  maximum 

One  simple  way  of  finding  9a  such  that  /(#a)  =  b 
consists  of  the  following  steps: 

1.  find  the  maximum,  using  Newton- Raphson  search, 

2.  divide  the  space  of  search  in  two  parts:  to  the  left 
and  to  the  right  of  the  maximum, 

3.  search  for  the  root  of  the  equation  only  in  one  of 
the  parts. 

If  we  have  the  previous  forward  kinematics  solution,  we 
can  use  that  value  as  a  starting  point  for  our  search, 
thus  avoiding  the  need  to  find  the  maximum. 

VI.  Implementation 

The  previous  algorithm  was  compiled  in  two  differ- 
ent computers:  a  Sun  Sparc  2  and  a  Sun  Sparc  20. 
Table  1  shows  the  different  average  times  of  calcula- 
tion. Table  2  shows  the  time  that  the  algorithm  takes 
with  respect  to  the  desired  accuracy  of  the  result. 

The  algorithm  of  forward  kinematics  was  tested  us- 
ing data  from  a  Stewart  based  milling  machine.  See 
Figure  12.  The  trajectory  is  the  spiral  shown  in  Fig- 
ures 13  and  14,  position  trajectory  and  unit  vectors, 
show  the  direction  of  the  end  effector  respectively. 
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Hardware 

average  time 

111  /XocCOIlClo 

Sun  Sparc  20  with 

UiJIIl^JllCl   vJ|J  UlliilZiClitlUll 

98 

Sun  Sparc  20  without 
compiler  optimization 

121 

Sun  Sparc  2  with 
compiler  optimization 

490 

Sun  Sparc  2  without 
compiler  optimization 

512 

Table  1:  Comparative  study  of  different  hardware  to 
find  the  complete  forward  kinematic  solution  with  an 
accuracy  of  ±0.01  radians  and  the  Newton  Raphson 
search  method 


Accuracy  in  (±  radians) 

typical  time  (//seconds) 

0.01 

98 

0.001 

136 

0.0001 

250 

0.00001 

492 

Table  2:  Typical  algorithm  times  using  a  Sun  Sparc 
20  and  using  the  "-04  -msuperspark"  compiling  op- 
tions with  different  accuracies  and  the  Newton  Raph- 
son search  method. 


Figure  12:  Ingersol-NIST  Hexapod 


Figure  13:  Spiraling  trajectory 


0.2        0.1         0       -0.1      -0.2  j    -0.2   -0.2  | 


Figure  14:  Angular  trajectory 


Figure  15  shows  the  four  four  sets  of  curves  created 
by  plotting  the  distance  BC  when  AC  =  AB  —  b 
as  a  function  of  9a  along  the  spiral  trajectory.  Each 
different  curve  corresponds  to  the  distance  {BC)  for 
a  point  along  the  trajectory  shown  in  Figures  13  and 
14.  Note  that  the  algorithm  does  not  need  to  draw 
this  curve  to  find  the  solution.  It  is  searching  in  these 
solutions  to  find  the  BC  =  b  solution  that  corresponds 
to  the  current  configuration  of  the  platform.  Figure  16 
is  a  two-dimensional  view  of  Figure  15  that  shows  the 
found  solutions  marked  in  the  bottom  left  curve. 

VII.  Conclusions 


1.  An  algorithm  of  forward  kinematics  for  Stewart 
Platform  which  allows  for  substantial  reduction 
of  computational  complexity  is  proposed. 

2.  Unlike  many  of  existing  systems,  this  algorithm 
knows  the  number  of  solutions  in  advance.  Since 
the  previous  configuration  is  considered  part  of 
the  assignment,  we  must  only  search  for  only  one 
root  in  most  cases. 
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Figure  15:  BC  along  the  spiral  trajectory 


Figure  16:  BC  along  the  spiral  trajectory  in  2d 

3.  In  addition  to  reducing  complexity,  the  algorithm 
is  easy  to  implement. 

4.  Due  to  the  constraint  and  monotonic  character  of 
the  functions  in  the  search  zone,  the  search  algo- 
rithm is  stable  (it  cannot  jump  roots). 

5.  The  algorithm  allows  for  real  time  applications. 
Further  reduction  of  computational  time  can  be 
achieved  by  introducing  parallel  computation. 

6.  Unlike  some  of  the  existing  algorithms,  the  pro- 
posed one  does  not  depend  on  the  length  of  the 
edges. 
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Abstract 

That  branch  of  semiotics  called  semantics 
deals  with  the  relationships  between 
meanings  and  representations.  In  my  view 
meanings  exist  only  in  brains  without 
representations  there.  A  meaning  is  the  focus 
of  an  activity  pattern  that  may  occupy  the 
entire  available  brain.  It  is  constructed  by 
intentional  action  followed  by  learning  from 
the  consequences  of  the  action. 
Communication  between  brains  requires 
reciprocal  construction  of  representations  in 
accordance  with  meanings,  which  elicit  other 
meanings  in  other  brains.  A  representation, 
as  a  material  object  or  process,  has  no 
meaning  in  itself.  EEG  data  indicate  that 
neural  patterns  of  meaning  occur  in 
trajectories  of  discrete  steps  marked  by 
cortical  state  transitions,  served  by  rapid 
exchanges  of  discrete  wave  packets  between 
interactive  cortical  domains.  These  wave 
packets  are  made  by  self-organizing 
dynamics  that  control  behavior  and  shape  the 
sensitivities  of  sensory  cortices  to  the 
sequellae  of  actions.  The  nature  of  causality 
must  be  analyzed  and  understood  in 
accordance  with  the  questions:  How  do 
meanings  cause  representations,  and  how  do 
representations  cause  meanings? 

Introduction 

Archeologists  studying  a  petroglyph  ask  not 
only  who  did  it  and  when,  but  what  does  it 
mean?  They  conclude  that  no  one  can  know 
what  it  means  now,  and  that  they  can  only 
speculate  on  the  prior  meanings  in  the  minds 
of  the  makers  and  viewers.  The  lesson  is  that 
the  petroglyph  contains  no  meaning,  even 


though  it  was  made  by  humans  with  intent  to 
communicate  meaning  by  evoking  the 
formation  of  comparable  meanings  in  other 
humans. 

Engineers  who  want  to  make  semantic 
machines  are  faced  with  the  task  of  defining 
meaning,  which  at  present  exists  only  in 
brains,  and  then  with  the  task  of  learning 
how  to  make  or  cause  meaning  in  machines, 
as  shown  by  Tani  [1996]  and  Clark  [1997]. 
The  requirements  on  network  models  to 
simulate  the  chaotic  dynamics  of  brains 
include  global  though  sparse  connectivity, 
continuous  time  dynamics,  and  distributed 
spatial  functions  in  two-dimensional  arrays  of 
nonlinear  integrators.  Digital  hardware  may 
suffice  to  emulate  the  biological  functions  of 
sensory  cortex  in  brains  by  use  of  nonlinear 
difference  equations  [Chang  and  Freeman, 
1996;  Shimoide  and  Freeman,  1995; 
Freeman  et  al.,  1996].  In  this  way,  a  next 
step  to  machine  intelligence  may  be  to  use  a 
model  of  a  sensory  cortex  as  an  interface 
between  the  unconstrained  real  world,  which 
is  infinitely  complex,  and  the  finite  state 
automata  that  constitute  the  main  support  for 
contemporary  artificial  intelligence.  That  is, 
models  from  brain  dynamics  can  provide 
eyes  and  ears  for  conventional  computers. 

However,  this  step  will  require  that  a  major 
problem  be  addressed:  the  relation  between 
representation  and  meaning  in  brain  function. 
Shannon-Weaver  information  theory,  which 
is  representational,  has  divorced  meaning 
from  information  and  therefore  does  not 
apply  to  brains.  The  aim  of  this  presentation 
is  to  sketch  some  of  the  principal  elements  of 
the  problem,  as  a  basis  for  discussing  some 
possible  pathways  toward  solutions  through 
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Representation  A 
 > 


Representation  B 


Figure  1 


a  better  understanding  of  the  biological  basis 
of  meaning  as  relations  between  brain  states 
and  behavioral  actions,  not  between  symbols 
in  syntactical  systems. 

Communication  by  representations 

Operational  discreteness  is  essential  for 
communication  in  dialogue.  A  pair  of  brains 
can  act,  sense,  and  construct  in  alternation 
with  respect  to  each  other,  not  merely  as  dogs 
sniff,  but  as  two  humans  speak,  listen,  and 
hear.  Consider  brains  A  and  B  interacting 
(Figure  1),  where  A-B  are  parent-child, 
wife-husband,  rabbit-dog,  philosopher- 
biologist,  neuroscientist-rabbit,  etc.  A  has  a 
thought  that  constitutes  some  meaning  M(a). 
In  accordance  with  this  meaning  A  acts  to 
shape  a  bit  of  matter  in  the  world  (a  trace  of 
ink  on  paper,  a  vibration  of  air,  a  set  of 
keystrokes  on  e-mail,  movements  of  the  face, 
etc.)  to  create  a  representation  (a  sign  or 
symbol  for  humans,  merely  a  sign  for 
animals)  directed  at  B,  R(ab).  B  is  impacted 
by  this  shaped  matter  and  is  induced  by 
thought  to  create  a  meaning  M(b).  So  B  acts 
to  shape  a  bit  of  matter  in  accordance  with 
M(b)  in  a  representation  R(ba),  which 
impacts  on  A  to  induce  M(a+1). 

And  so  on.  Already  by  this  description  there 
is  implicit  recognition  of  a  discrete  ebb  and 
flow  of  conversation  like  recurrence  of  tides, 
so  that  meanings  M(i)'s  as  constructions  of 
thoughts  become  the  internal  active  states, 


and  the  R(ij)'s  as  attributes  of  matter  become 
the  external  representations.  By  its  nature  an 
external  "representation  can  be  used  over 
and  over.  It  cannot  be  said  to  contain  or 
carry  meaning,  since  the  meanings  are  located 
uniquely  inside  A  and  B  and  not  between 
them.  Moreover,  the  same  R's  induces 
different  meanings  M(i)  in  other  subjects  C 
who  intercept  the  representations.  The 
objects  that  are  used  to  communicate  are 
shaped  by  meanings  that  are  constructed  in  A 
and  B  iteratively  and  induce  the  constructions 
of  meaning  in  B  and  A  alternately.  If 
communication  is  successful,  then  the 
internal  meanings  will  come  transiently  into 
harmony,  as  manifested  by  cooperative 
behavior  such  as  dancing,  walking  in  step, 
shaking  hands,  exchanging  bread,  etc. 
Symbols  can  persist  like  books  and  stone 
tablets,  while  minds  fluctuate  and  evolve  until 
they  die. 


Observations  of 
electroencephalograms 

A  biological  approach  to  the  problem  of 
meaning  is  to  study  the  evolution  of  minds 
and  brains,  on  the  premiss  that  animals  have 
minds  that  are  prototypic  of  our  own,  and 
that  their  brains  and  behaviors  tell  us  what 
essential  properties  are  common  to  their 
minds  and  to  our  own  minds. 
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Experimental  measurements  of  brain  activity 
(EEG)  that  follows  sensory  stimulation  of 
animals  show  that  sensory  cortices  engage  in 
construction  of  activity  patterns  in  response 
to  stimuli  [Freeman,  1975].  The  operations 
are  not  those  of  filter,  storage,  retrieval,  or 
correlation  mechanisms.  Each  construction  is 
by  a  state  transition,  in  which  a  sensory 
cortex  switches  abruptly  from  one  basin  of 
attraction  to  another,  thereby  changing  one 
spatial  pattern  instantly  to  another  like  frames 
in  a  cinema.  The  transitions  in  the  primary 
sensory  cortices,  visual,  auditory,  somatic 
and  olfactory  [Barrie,  Freeman  and  Lenhart, 
1996],  are  shaped  by  interactions  with  the 
limbic  system,  which  establish  multimodal 
unity,  selective  attention,  and  the  intentional 
nature  of  percepts.  The  interactions  of  the 
several  sensory  cortices  and  the  limbic 
system  occur  in  conjunction  with  goal- 
directed  actions  in  time  and  space.  Each 
cortical  state  transition  involves  synaptic 
changes  constituting  learning  throughout  the 
forebrain,  so  that  cumulatively  a  unified  and 
global  trajectory  is  formed  by  each  brain  over 
its  lifetime.  Each  spatial  pattern  appears  to 
reflect  the  entire  content  of  past  and  present 
experience  [Skarda  &  Freeman  1987],  that  is, 
a  meaning. 

The  most  important  experimental  finding  is 
that  the  neuroactivity  patterns  in  sensory 
cortex,  which  are  correctly  classified  on 
perception  of  conditioned  stimuli  by  the 
animals,  are  not  invariant  with  respect  to  the 
unchanging  physicochemical  stimuli.  The 
brain  activity  patterns  are  found  to  change 
slightly  but  significantly  with  any  change  in 
the  significance  of  the  stimuli,  such  as  by 
changing  the  reinforcement,  or  adding  new 
stimuli  [Freeman,  1992].  From  numerous 
tests  of  this  kind  the  conclusion  is  that  brain 
patterns  reflect  the  value  and  significance  of 
the  stimuli  for  the  animals,  not  a  fixed 
memory  store.  Each  pattern  formed  in 
response  to  the  presentation  of  a  stimulus  is 
freshly  constructed  by  chaotic  dynamics  in 
the  sensory  cortex,  in  cooperation  with  input 
from  the  limbic  system  enacting  processes  of 
attention  and  intention,  and  it  expresses  the 
history  and  existing  state  of  the  animal  as 
much  as  or  more  than  the  actual  incident 
stimulus.  The  patterns  cannot  be 
representations  of  stimuli  or  of  meanings  of 


stimuli.  They  are  active  states  induced  by 
stimuli,  constituting  evolution  of  the  brains  in 
their  growth  of  experience  [Piaget,  1930]. 

The  neural  basis  for  intentional  action 

The  making  of  a  representation  is  an 
intentional  action.  All  intentional  actions 
begin  with  the  construction  of  patterns  of 
neural  activity  in  the  limbic  system,  which 
has  been  shown  by  use  of  lesions  and  by 
comparative  neuroanatomy  and  behavior  to 
be  a  product  of  the  limbic  system  [Herrick, 
1948;  Roth,  1987;  Freeman,  1995].  In 
mammals  all  sensory  input  is  delivered  to  the 
entorhinal  cortex,  which  is  the  main  source  of 
input  to  the  hippocampus,  and  the  main  target 
of  hippocampal  output  (Figure  2).  Goal- 
directed  action  must  take  place  in  time  and 
space,  and  the  requisite  organ  for  these 
matrices  is  the  hippocampus  with  its  'short 
term  memory'  and  'cognitive  map'  [O'Keefe 
and  Nadel,  1978]. 

For  example,  hunger  is  an  emergent  pattern 
of  neuroactivity  that  expresses  the 
requirements  of  brains  and  bodies  for 
metabolic  fuel  and  building  material.  It 
induces  a  state  transition  in  the  neural 
populations  of  the  forebrain  under  the 
influence  of  sensory  stimuli  from  the  gut  and 
the  brain's  own  chemoreceptors  for  its 
chemical  state.  It  is  also  shaped  by 
neurohormones  from  nuclei  in  the  brain  stem. 
The  emergent  pattern  impacts  the  brain  stem 
and  spinal  cord,  leading  to  stereotypic 
searching  movements  that  are  adapted  to  the 
immediately  surrounding  world.  Feedback 
from  the  muscles  and  joints  to  the 
somatosensory  cortex  provides  confirmation 
that  the  intended  actions  are  taking  place. 
The  impact  of  the  movements  of  the  body  on 
sensory  input  is  conveyed  to  the  visual, 
auditory  and  olfactory  systems.  All  of  these 
perceptual  constructs,  that  are  triggered  by 
sensory  stimuli  and  are  dependent  on  prior 
learning,  are  transmitted  to  the  limbic  system, 
specifically  to  the  entorhinal  cortex,  where 
they  are  combined.  When  an  animal  detects 
an  odor  of  food,  it  must  hold  it,  move,  take 
another  sniff,  and  compare  the  two 
concentrations  in  order  to  decide  which  way 
to  move  next.  The  difference  in  strength  has 
no  meaning,  unless  the  animal  has  records  of 
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which  way  it  moved,  when  the  samples  were 
taken,  and  a  basis  for  determining  distance 
and  direction  in  its  environment.  These  basic 
operations  of  intentional  behavior  are 
properties  of  the  limbic  system.  The  same 
requirements  hold  for  all  distance  receptors, 
so  it  is  understandable  that  evolution  has 
provided  multimodal  sensory  convergence  in 
order  to  perform  space-time  integration  on  the 
Gestalt,  not  on  its  components. 

In  the  description  thus  far  the  flow  of  neural 
activity  is  counterclockwise  through 
proprioceptive  and  exteroceptive  loops 
outside  the  brain.  Within  the  brain  there  is  a 
clockwise  flow  of  activity  constituting 
reafference.  When  a  motor  act  is  initiated  by 
activity  descending  into  the  brainstem  and 
spinal  cord,  the  same  or  a  similar  activity 
pattern  is  sent  to  all  of  the  sensory  systems 
by  the  entorhinal  cortex,  which  prepares  them 
for  the  impact  of  the  movements  of  the  body 
and,  most  importantly,  sensitizes  them  by 
shaping  their  attractor  landscapes  to  respond 
quite  selectively  to  stimuli  that  are  appropriate 
for  the  goal  toward  which  the  action  has  been 
directed.  These  reafferent  patterns  have  been 
denoted  as  the  senseof  effort  [Helmholtz, 
1879],  reafferent  signals  (vonHolst  and 
Mittelstaedt,  1950],  efference  copies  [Sperry, 
1950],  and  preafference  [Kay,  et  al.,  1995, 
1996].  They  are  the  essence  of  attention. 


Linear  versus  circular  causality  in 
self-organizing  systems 

The  conventional  view  of  sensory  cortical 
function  holds  that  stimuli  activate  receptors, 
which  transmit  to  sensory  cortex  through  a 
linear  causal  chain,  with  the  eventual  outcome 
of  a  motor  response  to  the  initiating  stimulus. 
Modeling  with  nonlinear  dynamics  shows 
that  the  stimulus  is  typically  not  the  initiating 
event.  Rather  it  is  the  search  for  the  stimulus 
that  arises  in  the  limbic  system  in  a  recurrent 
manner  from  prior  search  and  its  results. 
This  is  circular  causality  at  the  level  of 
intentional  behavior  [Merleau-Ponty,  1942]. 

Much  lower  in  the  hierarchy  of  brain 
organization  is  the  event  in  the  primary 
sensory  cortex,  which  consists  of  the 
destabilization  of  a  macroscopic  state  by  the 
introduction  of  microscopic  sensory  input. 
In  this  case  the  transition  from  a  prior  basin 
of  attraction  to  a  new  one,  which  has  been 
facilitated  by  limbic  modulation,  is  guided  by 
the  sensory  input  that  activates  a  learned 
nerve  cell  assembly  comprising  a  small 
subset  of  cortical  neurons.  The  transition  to  a 
new  state  is  global,  so  that  this  causal  chain  is 
also  circular.  The  stimulus-dependent  neural 
activity  of  a  few  neurons  triggers  the  state 
transition,  and  then  the  entire  domain  of  the 
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primary  sensory  cortex  transits  to  another 
pattern,  which  in  the  words  of  Haken  (1983) 
"enslaves"  the  whole  set  of  cortical  neurons 
by  acting  as  an  "order  parameter".  This  new 
active  state  has  been  characterized  by  Ilya 
Prigogine  (1980)  as  a  "dissipative  structure", 
that  constitutes  the  "emergence  of  order  out 
of  chaos". 

The  similarity  of  the  properties  of  neural 
activity  in  the  various  parts  of  the  limbic 
system  to  those  in  the  primary  sensory 
cortices  [Kay  et  al.,  1995,  1996]  indicates 
that  populations  of  neurons  there  also 
maintain  global  attractors,  which  are  accessed 
by  nonlinear  state  transitions,  and  which  are 
responsible  for  the  genesis  of  motor  patterns 
controlling  goal-directed  actions  and  of 
reafference  patterns  that  prepare  the  sensory 
cortices  for  the  consequences  of  those 
actions. 

A  hypothesis  on  the  causal  relations 
of  meanings  and  representations 

The  idea  is  proposed  that  representations  are 
formed  by  the  forward,  counterclockwise 
flow  of  neural  activity,  which  emerges  from  a 
microscopic  level  by  the  interactions  of 
neurons  and  neuronal  populations,  and  which 
places  the  motor  systems  of  the  brainstem 
and  spinal  cord  into  appropriate  basins  of 
attraction,  thereby  changing  the  sensory 
inflow  in  a  goal-directed  manner.  The 
making  of  a  representation  is  an  ordering  of 
the  neural  control  systems  of  the 
musculoskeletal  apparatus,  that  is  aimed  to 
elicit  sensory  feedback  of  a  certain  kind, 
namely  the  patterns  of  receptor  discharge 
from  representational  stimuli  transmitted  by 
other  beings,  that  place  the  sensory  cortices 
into  the  expected  basins  of  attraction.  The 
meaning  of  the  representation  is  implicit  in 
the  form  that  is  given  to  the  representation  by 
the  limbic  system. 

The  clockwise  backflow  of  neural  activity 
serves  as  an  order  parameter  to  modulate  and 
shape  the  neural  activity  patterns  of  the 
sensory  cortices,  which  transmit  the  states  of 
their  neural  populations  before  and  after  the 
expected  inputs  have  occurred,  and  also  if 
they  do  not  occur  as  expected,  or  at  all.  It 
comprises  not  only  the  exteroceptive  input 


but  the  proprioceptive  feedback  as  well.  This 
global  active  state,  enslaving  alike  the  limbic 
system  and  the  primary  sensory  cortices, 
shapes  the  meaning  not  only  of  the  unified 
sensory  input  consequent  to  the  transmitted 
representation,  but  also  of  the  emitted 
representation. 

The  implication  here  is  that  the  agent  who  is 
constructing  and  transmitting  the 
representation  cannot  fully  know  its  meaning 
until  after  the  immediate  consequences  have 
been  delivered  through  his  or  her  own 
sensory  systems.  More  generally,  a  poet, 
painter,  or  scientist  cannot  know  the  meaning 
of  his  or  her  creation  until  after  the  act  has 
been  registered  as  an  act  of  the  self,  nor  even 
until  the  the  listeners  and  viewers  have 
responded  with  reciprocal  representations  of 
their  own,  each  with  meaning  unique  to  the 
recipients. 

Conclusion 

Why  do  brains  work  this  way?  Animals  and 
humans  survive  and  flourish  in  an  infinitely 
complex  world  despite  having  finite  brains. 
Their  mode  of  coping  is  to  construct 
hypotheses  in  the  form  of  neural  activity 
patterns  and  test  them  by  movements  into  the 
environment.  All  that  can  be  known  is  that 
which  has  been  constructed,  tested,  and 
either  accepted  or  rejected  [Piaget,  1930; 
Merleau-Ponty,  1942].  The  same  limitation 
is  currently  encountered  in  the  failure  of 
machines  to  function  in  environments  that  are 
not  circumscribed  and  reduced  in  complexity 
from  the  real  world.  Truly  flexible  and 
adaptive  intelligence  operating  in  realistic 
environments  cannot  flourish  without 
meaning. 

This  global  state  variable  may  be  regarded  as 
a  mechanism  supporting  consciousness, 
which  in  the  neurodynamic  view  is  a  global 
internal  state  variable  composed  of  a 
sequence  of  momentary  states  of  awareness 
[Hardcastle  1995].  Its  regulatory  role  is 
comparable  to  that  of  the  operator  in  a 
thermostat,  that  instantiates  the  difference 
between  the  sensed  temperature  and  a  set 
point,  and  that  initiates  corrective  action  by 
turning  a  heater  on  or  off.  The  machine  state 
variable  has  little  history  and  no  capacities  for 
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learning  or  determining  its  own  set  point,  but 
the  principle  is  the  same:  the  internal  state  is  a 
form  of  energy,  an  operator,  a  predictor  of 
the  future,  and  a  carrier  of  information  that  is 
available  to  the  system  as  a  whole.  The 
feedback  device  is  a  prototype,  an 
evolutionary  precursor,  not  to  be  confused 
with  awareness,  any  more  than  tropism  in 
plants  and  bacteria  is  to  be  confused  with 
intentionality.  In  animals  and  humans,  the 
operations  and  informational  contents  of  this 
global  state  variable  constitute  the  experience 
of  causation. 
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INTELLIGENCE  vs.  MENTALITY: 
Important  but  Independent  Concepts 
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Abstract 

intelligence  and  mentality  are  frequently  identified  as  though  they  were  one 
property  designated  by  different  predicates.  That,  however,  appears  to  be  a 
misconception-aibeit  one  with  many  intriguing  ramifications.  The  most  com- 
mon tendency  presumably  would  be  to  identify  intelligence  with  mentality  of 
a  high  order.  But  if  some  cognitive  systems  possess  mentality  of  a  low  order, 
then  they  have  mentality  without  intelligence.  Other  systems,  which  may  or 
may  not  qualify  as  "cognitive",  moreover,  appear  to  have  intelligence  without 
mentality.  Since  properties  are  the  same  only  if  they  have  the  same  class  of 
instances,  evidently  intelligence  and  mentality  cannot  be  the  same  property. 


t.  Introduction.  Perhaps  the  most  com- 
mon mistake  that  occurs  within  the  con- 
text of  theorizing  about  cognition  is  that 
of  treating  intelligence  and  mentality  as 
if  they  were  the  same  thing.  If  we  were 
to  employ  the  term  "system"  to  stand  for 
any  entity-simple  or  complex-for  which 
specific  kinds  of  input,  causes,  or  stimuli 
(probabilistically)  bring  about  particular 
kinds  of  output,  effects,  or  responses,  no 
one  ought  to  be  tempted  to  suppose  that 
those  entities  are  necessarily  "cognitive", 
"intelligent"  or  possess  "mentality".  The 
concept  of  system  thus  defined  is  broad 
enough  to  include  sticks  and  stones  along 
with  digital  machines  and  human  beings 
within  its  extension  or  class  of  instances. 

Since  few  would  suppose  that  sticks  or 
stones-ordinary  sticks  and  stones,  at  any 
rate-are  things  that  are  capable  of  cogni- 
tion, the  intension  or  defining  conditions 
of  "cognitive  system"  had  better  exclude 


them  on  pain  of  demonstrating  its 
own  inadequacy.  If  some  systems 
are  cognitive,  while  others  are  not, 
then  that  property,  too.  had  better 
be  carefully  defined,  with  consider- 
ation for  whether  or  not  it  clarifies 
and  illuminates  their  differences  as 
well  as  their  similiarites.  It  may  be 
that  some  systems  properly  qualify 
as  "intelligent"  even  though  they  do 
not  properly  qualify  as  "cognitive". 

Of  course,  if  we  are  unable  to  de- 
fine what  it  means  for  something  to 
be  a  "cognitive  system",  then  we  are 
open  to  the  charge  that  we  literally 
do  not  know  what  we  are  talking  a- 
bout,  My  purpose  here,  therefore, 
is  to  sketch  a  theory  of  the  nature 
of  mentality  that  brings  with  it  an 
account  of  the  nature  of  cognition. 
The  differences  between  mentality 
and  intelligence  are  then  explored 
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with  special  concern  for  the  prospects  of 
developing  tests  for  mentality  and  tests 
for  intelligence,  respectively,  which  tend 
to  support  the  distinctions  drawn  within 
this  context  and  their  consequences  with 
respect  to  differentiating  between  them. 

2.  Minds  as  Semiotic  Systems.  The  right 
place  to  begin.  I  believe,  is  with  a  theory 
of  mind  that  builds  on  a  foundation  pro- 
vided by  Charles  Peirce's  theory  of  signs. 
According  to  Peirce,  a  sign  is  a  something 
that  stands  for  something  else  in  some  re- 
spect or  other  for  somebody.  There  turn 
out  to  be  three  basic  classes  of  signs  that 
differ  in  the  way  in  which  they  stand  for 
something  else,  namely:  icons,  which  are 
things  that  resemble  other  things;  indices. 
which  are  things  that  are  causes  or  effects 
of  those  other  things;  and  symbols,  which 
are  merely  habitually  associated  with  the 
things  for  which  they  stand.  Relations  of 
resemblance  and  of  cause-and -effect  pro- 
vide natural  as  opposed  to  artificial  signs. 

Photographs,  paintings,  and  statues  (at 
least,  when  they  look  like  the  things  they 
stand  for)  are  therefore  icons  in  this  sense. 
Ashes  are  indices  of  fire  and  fire  of  ashes, 
because  they  are  causally  connected.  The 
words-such  as  "golf"  and  "tennis",  that  oc- 
cur in  ordinary  languages,  such  as  English- 
however,  are  symbols,  even  when  the  hab- 
itual associations  between  such  words  and 
that  for  which  they  stand  happen  to  be  re- 
inforced by  traditions,  customs,  and  prac- 
tices of  a  community.  Attaining  the  goals 
of  a  community  requires  cooperation,  and 
cooperation  is  facilitated  by  (the  capacity 
for)  successful  communication.  Although 
many  words  could  have  stood  for  things 
other  than  those  for  which  they  happen 
to  stand,  when  they  are  thus  reinforced, 
they  assume  the  standing  of  conventions. 


The  triadic  character  of  the  sign 
relation-which  connects  signs  with 
what  they  stand  for  and  sign -users 
-suggests  the  prospect  of  defining 
minds  as  sign-using  lor  "semiotic") 
systems,  where  "minds"  turn  out 
to  be  the  kinds  of  things  for  which 
something  can  stand  for  something 
else  in  some  respect  or  other.  This 
approach  implies  that  there  ought 
to  be  at  least  three  kinds  of  minds, 
namely:  those  that  have  the  capac- 
ity to  use  icons;  those  that  have  the 
capacity  to  use  indices;  and  those 
that  have  the  capacity  to  use  sym- 
bols, as  successively  stronger  kinds 
of  minds  (Fetzer  1988.  1990.  1996). 

3.  Conditions  of  Adequacy.  Among 
the  most  important  but  least  men- 
tioned aspects  of  theory  of  mind  is 
the  conditions  that  must  be  satisfi- 
ed for  a  theory  of  mind  to  qualify 
as  adequate.  These  include  (CA-i ) 
explaining  how  mental  states  can 
exert  an  influence  upon  behavior; 
(CA-2)  the  differences  between  hu- 
man minds,  animal  minds,  and  the 
minds  of  machines,  if  such  a  thing 
is  possible;  and  (CA-3)  how  we  can 
tell-how  it's  possible  to  know-whe- 
ther  or  not  something  has  a  mind. 
These  may  not  be  the  only  desid- 
erata an  adequate  theory  of  mind 
must  satisfy,  but  they  are  among 
the  most  important.  Whether  or 
not  the  theory  of  minds  as  semi- 
otic systems  can  satisfy  them  is 
thus  a  measure  of  its  adequacy. 

Because  signs  "stand  for"  that 
which  they  stand  in  some  resoect 
or  other,  things  that  "stand  for" 
other  things  characteristically  do 
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so  oniy  partially  rather  than  completely. 
Because  everything  resembles  itself-by 
"being  like"  itself  in  every  property  and 
at  every  time-there  is  a  trivial  sense  in 
which  everything  stands  for  itself  iconi- 
cally.  Typically,  however,  things  "stand 
for"  other  things  only  in  certain  respects, 
as  in  the  case  of  a  driver's  license  photo, 
which  may  resemble  you  (on  a  bad  day) 
when  viewed  from  in  front  but  not  when 
viewed  from  the  side  (since  you  are  just 
not  that  thin)!  The  use  of  even  the  least 
complex  kind  of  sign-an  icon-thus  pre- 
supposes the  adoption  of  an  appropriate 
perspective,  that  is,  of  a  "point  of  view"! 

The  theory  of  minds  as  semiotic  sys- 
tems supports  a  distinction  between  con- 
sciousness and  cognition  relative  to  signs 
of  various  kinds.  A  system  is  conscious 
(with  respect  to  signs  of  specific  kinds) 
when  it  has  the  ability  to  utilize  signs  of 
that  kind  and  is  not  incapacitated  from 
the  exercise  of  that  ability.  Even  when 
a  person  is  familiar  with  the  rules  of  the 
road  and  understands  traffic  signs,  that 
does  not  guarantee  they  will  be  able  to 
perceive  and  respond  to  them.  The  con- 
ditions under  which  drivers  fail  to  obey 
signs  include  those  in  which,  although  a 
sign  was  present,  it  was  obscured  from 
vision,  the  driver  suffered  from  (temp- 
orary) blindness,  or  was  so  intoxicated 
that  his  attention  span  was  abbreviated. 

4.  The  Nature  of  Cognition.  The  phenom- 
enon known  as  cognition  appears  to  occur 
as  a  consequence  of  the  causal  interaction 
between  signs  and  minds  (or  sign-users). 
When  a  sign-user  is  conscious  in  relation 
to  signs  of  a  specific  kind,  then  the  pres- 
ence of  a  sign  of  that  kind  within  suitable 
causal  proximity  (probabilistically)  brings 
about  the  occurrence  of  cognition,  during 


which  the  sign  is  taken  as  stand- 
ing for  something  in  some  respect 
or  other  by  that  user.  Thus,  con- 
sciousness combines  ability  and 
capability,  while  cognition  arises 
as  an  effect  of  consciousness  and 
opportunity,  where  both  notions 
are  relative  to  signs  of  fixed  kinds. 

The  meaning  of  signs  for  sign- 
users  may  be  captured  most  ade- 
quately by  their  causal  influence 
upon  behavior.  This  influence  is 
affected  by  the  causal  interaction 
of  every  other  factor  whose  pres- 
ence or  absence  makes  a  difference 
to  the  behavior  of  that  system.  In 
the  case  of  human  beings,  the  full 
range  of  kinds  of  factors  that  tend 
to  affect  behavior  would  appear  to 
be  exhausted  by  motives,  beliefs, 
ethics,  abilities,  capabilities,  and 
opportunities.  The  role  of  oppor- 
tunities differs  somewhat  relative 
to  factors  of  these  other  kinds  in- 
sofar as  they  represent  the  actual 
situations  that  obtain  for  systems 
themselves  as  opposed  to  the  sit- 
uations they  believe  they  are  in. 

If  we  refer  to  some  particular 
combination  of  motives,  beliefs, 
ethics,  abilities  and  capabilities 
as  a  "context"  CL  then  the  mean- 
ing of  a  sign  S  for  a  system  z  is 
the  totality  of  behaviors  (actual 
or  potential)  BL  B2, . . .  that  the 
system  would  (probabilistically) 
display  in  the  presence  of  that 
sign,  relative  to  various  contexts 

CL  C2,  Thus,  if  we  employ 

". . .  =>  "  as  a  subjunctive  con- 
ditional and  ". . .  =n=>  "  as  the 

causal  conditional,  the  meaning 
of  sign  S  for  z,  relative  to  CL  is: 
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(MSI)  Clzt-j  (Szt  «n->Bl.zt*)& 
(MS2)  C2zt  =>  (Szt=m=>B2zt*)& 


where  the  causal  influence  of  S  within 
context  Ci  would  (with  the  strength  n) 
bring  about  behavior  of  kind  BJj  with- 
in context  C2  would  (with  the  strength 
m)  bring  about  behavior  of  kind  B2;  etc. 

5.  Meaning  and  Behavior.  Thus,  when 
a  little  old  lady  from  Pasadena  reaches 
a  traffic  intersection  and  fails  to  notice 
a  stop  sign  because  her  vision  is  not  as 
good  as  it  used  to  be,  she  may  run  the 
stop  sign  and  cause  an  accident,  but  it 
would  not  have  been  on  purpose.  If  a 
felon  fleeing  with  police  in  hot  pursuit 
notices  a  stop  sign  but  runs  it  anyway, 
he  may  cause  an  accident,  but  because 
it  was  a  risk  that  he  was  willing  to  run. 
When  an  expectant  father  decides  that 
he  cannot  take  the  time  to  obey  a  sign 
because  his  wife  has  begun  labor  in  the 
back  seat,  he  might  pray  no  accident  is 
about  to  happen,  because  that  would  be 
the  worst  possible  outcome.  And  so  on. 

The  same  sign  (such  as  an  octagonal 
red  surface  with  the  letters  "S-T-O-P" 
inscribed)  thus  has  the  same  meaning 
for  different  systems  when  their  (act- 
ual or  potential)  behavior  in  the  pres- 
ence of  that  sign- more  precisely,  the 
strength  of  their  tendencies  toward  be- 
havior of  various  kinds-are  the  same 
across  different  contexts.  This  does  not 
mean  their  actual  behavior  is  the  same, 
because  they  may  have  been  in  differ- 
ent contexts,  but  only  that,  if  they  had 
been  in  the  same  context,  then  the  be- 
havior they  displayed-more  precisely, 
the  strength  of  their  tendencies  toward 
behavior  of  various  kinds-would  have 
been  the  same,  under  those  conditions. 


Similarly,  different  signs  have 
the  same  meaning  for  a  system 
when  the  (actual  or  potential)  be- 
havior it  would  display-more  pre- 
cisely, the  strength  of  its  tenden- 
cies toward  behavior  of  different 
kinds-would  be  the  same  across 
different  contexts.  It  can  turn  out 
that  sameness-of- meaning  differs 
in  some  contexts  even  though  not 
in  others.  With  regard  to  purchas- 
ing power,  four  quarters,  two  half- 
dollars,  and  a  dollar  bill  would  be 
the  same;  from  other  perspectives, 
such  as  convenience  of  carrying  in 
a  wallet,  they  would  differ.  Same- 
ness of  meaning  thus  seems  to  be  a 
property  that  can  vary  by  degrees. 

6.  Animal  Mentality.  The  theory 
of  minds  as  semiotic  systems  also 
implies  that  there  should  be  a  cor- 
relation between  lower  species  and 
lower  mentality  and  higher  species 
and  higher  mentality.  Indeed,  sym- 
bolic mentality  presupposes  index- 
ical  mentality,  and  indexical  mental- 
ity presupposes  iconic.  It  would  be 
predictable  on  the  basis  of  the  semi- 
otic  account,  therefore,  that  there  is 
some  corresponding  progression  in 
nature  from  iconic  mentality  up  to 
symbolic,  where  each  kind  might  be 
instantiated  in  nature  in  more  than 
one  variety,  depending  on  the  num- 
ber and  types  of  signs  a  system  uses. 

The  lowly  E.  coli  bacterium,  for  ex- 
ample, has  the  tendency  to  swim  to- 
ward twelve  different  chemo-tactic 
substances  and  to  swim  away  from 
eight  more  (Bonner  1980,  p.  63).  It 
should  be  evident,  from  the  semiotic 
perspective,  that  E.  coli  possess  the 
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ability  to  recognize  different  instances 
of  each  of  these  twenty  che mo- tactic 
substances,  since  otherwise  they  would 
be  unable  to  respond  to  them  properly. 
Somewhat  surprisingly,  therefore,  E.  coli 
appear  to  possess  iconic  mentality.  It  is 
not  necessary  that  E.  coli  should  also  be 
aware  that  twelve  of  these  are  benefic- 
ial and  eight  of  these  are  harmful,  since 
the  tendency  to  avoid  harmful  stimuli 
and  to  encounter  beneficial  ones  would 
evolve  as  a  function  of  natural  selection 
even  in  the  absence  of  indexical  ability. 

Another  (no  doubt,  less  controversial) 
example  of  animal  mentality  arises  with 
the  vervet  monkey,  who  makes  at  least 
three  different  alarm  calls,  one  for  eagles 
(or  air-born  predators),  one  for  leopards 
(or  earth-bound  predators),  and  one  for 
snakes  (or  earth-bound  curiousities).  A 
call  of  the  first  kind  tends  to  cause  them 
to  hide  in  thick  bushes;  a  call  of  the  sec- 
ond kind,  to  run  up  into  trees;  and  a  call 
of  the  third  kind,  to  look  down  for  some- 
thing interesting  to  watch  (Slater  1985, 
pp.  155-157).  Observe  how  beautifully 
this  example  satisfied  the  conception  of 
minds  as  sign-using  systems,  where  the 
meaning  of  a  sign  is  given  by  the  act  ual 
and  potential  behavior  it  induces  within 
a  context.  Here  clearly  at  least  indexical 
and  perhaps  even  symbolic  mentality  is 
displayed,  since  the  alarm  calls  even  ap- 
pear to  possess  elements  of  conventions. 

7.  Machine  Mentality.  It  may  come  as  a 
disappointment  to  some  that  the  theory 
of  minds  as  sign-using  systems  implies 
that  ordinary  computers-von  Neumann 
or  digital  machines,  at  least-are  not  the 
possessors  of  minds.  This  follows  from 
the  realization  that  these  machines  are 
designed  to  process  marks  or  to  manip- 


ulate syntax,  where  syntax"  consists 
of  strings  of  marks  that  are  subject 
to  interpretation.  Since  digital  mach- 
ines are  capable  of  processing  syntax, 
it  may  be  tempting  to  suppose  they 
must  be  possessors  of  mind.  But  the 
capacity  to  processs  syntax  is  merely 
a  necessary  and  not  a  sufficient  con- 
dition for  a  system  to  be  '  se miotic", 
since  it  must  also  be  the  case  that 
the  marks  that  are  processed  stand 
for  something  for  that  system  rather 
than  just  for  the  users  of  that  system. 

The  principal  difference  between 
se  miotic  systems  and  syntax- process- 
ing systems  (such  as  digital  machines) 
is  that  the  syntax  that  is  processed  by 
a  se  miotic  system  stands  for  something 
for  that  system  itself,  while  the  syntax 
that  is  processed  by  a  digital  machine 
instead  stands  for  something  for  users 
of  that  system.  Since  a  mind  is  a  some- 
thing for  which  something  can  stand  for 
something  else  for  that  system,  clearly 
systems  that  lack  the  ability  of  taking 
something  to  stand  for  something  for 
itself  cannot  properly  qualify  as  minds. 
This  might  be  described  as  the  "static" 
difference  between  minds  and  digital 
machines.  (Fetter  1988,  1990,  1996.) 

Another  difference  that  some  may 
consider  to  be  at  least  equally  instruc- 
tive might  be  described  as  a  "dynam- 
ic" difference  between  them,  which  is 
that  digital  machines  are  governed  by 
algorithms  implemented  in  the  form 
of  programs,  because  of  which  trans- 
tions  between  machines  states  assume 
the  character  of  (what  has  sometimes 
been  called)  "disciplined  step  satisfac- 
tion". A  se  miotic  analysis  of  the  tran- 
sitions that  typify  human  thought,  by 
contrast,  suggest  that  human  thought 
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has  an  associationislic  dimension  thai 
disciplined  step  satisfaction  lacks  (Fet- 
zer  1994).  The  potential  for  the  same 
sign  to  function  iconicaily,  indexically, 
or  symbolically-possibly  probabilist- 
icaily-in  the  same  or  different  contexts 
poses  apparently  insuperable  problems 
for  those  who  would  reduce  thinking  to 
computing.  They  are  just  not  the  same. 

8.  Mentality  and  Intelligence.  Perhaps 
the  gravest  blunder  that  has  occurred 
in  discussions  of  machines  and  mental- 
ity has  been  to  ignore  the  difference  be- 
tween systems  that  yield  the  same  out- 
put given  the  same  input  and  systems 
that  yield  the  same  output  given  the 
same  input  by  means  of  the  same  pro- 
cesses. A  simple  illustration  arises  in 
the  case  of  question-answering  mech- 
anisms. The  prospect  that  a  machine 
might  be  able  to  answer  questions  by 
providing  the  same  answers  that  a  hu- 
man being  might  provide  not  only  does 
not  prove  it  is  human  but  also  does  not 
prove  it  has  a  mind.  The  critical  differ- 
ence concerns  the  issue  cited  above  be- 
tween systems  for  which  things  stand 
for  other  things  for  those  systems  them- 
selves and  those  for  which  things  stand 
for  the  users  of  those  systems.  No  one 
should  doubt  that  digital  machines  can 
simulate  thought  processes;  the  serious 
issue  is  whether  they  can  replicate  them. 

The  difference  between  intelligence 
and  mentality  arises  again  at  this  junc- 
ture. The  appropriate  criterion  for  men- 
tality, given  the  theory  of  minds  as  semi- 
otic  systems,  appears  to  be  the  ability  to 
make  a  mistake  (Fetzer  1988,  1990,  and 
1996,  for  example).  Things  that  have  the 
ability  to  make  mistakes,  after  all,  have 
the  ability  to  take  something  to  stand  for 


something,  while  doing  so  wrongly. 
This  is  an  ability  that  appears  to  be 
wide-spread  among  animals,  more- 
over, but  impossible  for  digital  ma- 
chines, which  can  malfunction,  but 
whose  "mistakes"  are  attributed  to 
those  who  design,  program  and  use 
them  rather  than  to  those  machines. 

In  the  case  of  intelligence,  how- 
ever, it  would  be  at  least  faintly  ri- 
diculous to  suppose  that  a  suitable 
test  would  be  the  ability  to  commit 
mistakes!  Something  more  like  the 
opposite-the  ability  to  avoid  them- 
would  seem  to  be  a  more  appropri- 
ate criterion.  But  this  is  because  in- 
telligence is  commonly  understood 
to  be  akin  to  a  high  order  of  mental- 
ity. Machines  that  are  incapable  of 
committing  mistakes  may  still  be  en- 
titled to  be  described  as  "intelligent", 
especially  when  they  are  capable  of 
the  successful  performance  of  com- 
plex tasks,  such  as  the  manipulation 
of  syntax.  Digital  machines  thus  exem- 
plify intelligence  without  mentality, 
where  the  conflation  of  these  proper- 
ties seems  to  be  a  significant  mistake. 
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ABSTRACT 

A  hypothesis  is  proposed  that  each  cortico  -  basal  ganglia  - 
thalamocortical  loop  is  a  control  system  that  possesses  a  model  of 
behavior  of  its  controlled  object.  The  control  system  utilizes  error 
signals  that  are  distributed  via  dopaminergic  neurons  of  the 
substantia  nigra  pars  compacta  and  ventral  tegmental  area  to  adjust 
both  the  control  law  and  the  model,  i.e.,  for  learning.  The  whole 
system  of  cortico  -  basal  ganglia  -  thalamocortical  loops  is 
considered  as  a  system  with  nested  hierarchies  of  functional  loops. 
Movement  toward  higher  hierarchical  levels  is  accompanied  by 
generalization  of  encoded  parameters.  An  error  signal  computed  at  a 
particular  level  is  sent  to  the  model  of  the  same  level  and  to  the 
higher  loop  as  well.  Within  the  proposed  theoretical  framework, 
Parkinson's  disease  that  is  caused  by  a  substantial  loss  of  substantia 
nigra  dopaminergic  neurons  can  be  conceptualized  as  a  disorder  of 
the  error  distribution  system.  Necessary  experimental  data 
supporting  the  proposed  hypothesis  are  discussed. 

KEYWORDS:  Optimal  Control  System,  informational 
signal,  initiating  signal,  error  signal,  basal  ganglia 

1.  INTRODUCTION 

Cortico  -  basal  ganglia  -  thalamocortical  circuits  are 
considered  to  be  the  major  component  of  the  highest  brain 
hierarchical  levels.  Five  such  circuits  were  distinguished:  the 
"motor"  (or  "skeletomotor"),  the  "oculomotor",  the 
"dorsolateral  prefrontal",  the  "lateral  orbitofrontal",  and  the 
"anterior  cingulate"  [1].  According  to  the  existing  views, 
each  basal  ganglia  -  thalamocortical  circuit  receives  its 
multiple  corticostriate  projections  only  from  functionally 
related  cortical  areas.  Moreover,  each  circuit  is  formed  by 
partially  overlapping  corticostriate  inputs  which  are 
progressively  integrated  in  their  passage  through  pallidum 
and  substantia  nigra  (pars  reticulata)  to  the  thalamus,  and 
from  there  to  a  definite  cortical  area.  Usually  the  target  area 
is  one  of  those  which  sent  projections  to  the  basal  ganglia. 
That  is  why  the  hypothesis  appeared  that  the  characteristic 
feature  of  all  basal  ganglia  -  thalamocortical  circuits  is  the 
combination  of  "open"  and  "closed"  loops. 
According  to  current  views,  similar  neuronal  operations  are 
performed  at  comparable  stages  of  each  of  the  five  mentioned 
loops;  the  apparent  uniformity  of  synaptic  organization  at 
corresponding  levels  of  these  loops  and  the  parallel  nature  of 
these  circuits  are  indirect  proof  of  this  opinion. 


Classical  neurobiological  theoretical  views  are  not  capable  of 
providing  a  reasonable  explanation  of  how  these  loops 
function  and  why  they  were  created  in  the  course  of 
evolution.  However,  it  is  relatively  easy  to  create  a  conceptual 
theory  of  function  of  these  loops  by  utilizing  modern  control 
theory. 

2.  A  NOTION  OF  GENERIC  NEURAL 
OPTIMAL  CONTROL  SYSTEM 

Experimental  studies  of  spinal  Central  Pattern  Generators 
(CPG)  for  locomotor  and  scratching  hindlimb  movements  in 
cats  have  demonstrated  that  these  CPGs  possess  a  model  of 
object  behavior  [2],  and  CPG  treats  model  flow  as  a 
component  of  peripheral  afferent  flow.  The  latter  means  that 
both  flows  interact  on  a  parity  basis.  A  CPG  deprived  of 
afferent  flow  (after  deafferentation)  can  generate  "proper" 
rhythm  by  utilizing  model  afferent  flow. 
Later  theoretical  developments  led  to  the  conclusion  that  a 
CPG  is  a  regime  of  work  of  spinal  Optimal  Control  System 
(OCS),  and  every  neuronal  OCS  is  constructed  according  to 
the  same  functional  principle,  regardless  of  its  location  in  the 
hierarchy  of  the  nervous  system  (Fig.  1).  Such  neural 
controlling  system  contains  two  distinct  functional 
subdivisions:  (1)  a  controller,  the  subsystem  providing  a 
governing  set  of  rules  or  commands  -  a  controlling  law  -  that 
directs  the  action  of  the  recipient  of  these  rules  -  the 
controlled  object;  and  (2)  a  model,  the  subsystem  that 
generates  a  model  of  object  behavior,  i.e.,  expected  afferent 
flow  from  the  controlled  object.  The  function  of  the  model  is 
to  predict  with  high  probability  any  given  state  of  the 
controlled  object  that  can  result  from  influences  on  it  from  its 
controlling  system. 

There  are  two  main  reasons  for  the  control  system  to  have  a 
model  of  the  object  behavior  -  incomplete  observability  and 
incomplete  controllability  of  the  controlled  object.  The 
presence  of  the  model  within  any  neural  control  system  is  a 
consequence  of  its  optimality.  Model  afferent  flow  is  used  by 
the  system  to  determine  the  most  probable  current  state  of  the 
object  by  using  the  mechanism  of  filtering  of  afferent 
information  (for  simplicity,  the  filtering  mechanism  is  put  in 
the  control  law  box  and  is  not  shown  in  the  Fig.  1).  Another 
advantage  of  having  a  model  is  the  ability  of  a  control  system 
to  receive  very  important  information  such  as  mismatch  or 
error  signals.  A  mismatch  signal  between  real  and  model 
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flows  is  necessary  for  the  process  of  learning  (see  below)  and 
is  also  computed  during  filtering  process  (Fig.  1). 

Figure  1.  Functional 
architecture  of  a  generic 
neural  optimal  control 
system  (OCS).  Types  of 
signals  -  informational  and 
initiating  -  that  controlled 
object  and  subunits  of  OCS 
send  to  each  other  are 
shown  by  single  and 
double  arrows, 
respectively. 

It  is  necessary  to  note  that 
the  described  above  two 
functional  subdivisions  of 
any  given  OCS  may  be 
inseparable  anatomically. 
Both  the  control  law  and 
the  model  can  be  realized  within  the  same  neural  circuit  in 
the  simplest  of  control  systems.  These  two  functional 
subdivisions  can  be  more  clearly  separated  anatomically  in 
complex  control  systems. 

It  is  rather  easy  to  imagine  a  hierarchy  of  neuronal  OCSs. 
Lower  OCS  becomes  a  controlled  object  for  higher  OCS.  Fig. 
1  illustrates  what  signal  types  are  sent  by  one  hierarchical 
level  to  another.  It  is  well  seen  that  the  controlled  object 
sends  to  its  controlling  system,  lower  OCS,  the  same  types  of 
signals  that  the  lower  OCS  sends  to  a  higher  OCS.  Those  are 
initiating  and  informational  signals.  The  latter  are  used  by  a 
control  system  to  compute  a  control  output  that  minimizes 
the  initiating  signals.  An  initiation  of  lower  automatism  is 
performed  by  a  simple  tonic  command  that  means  that  a 
trajectory  at  the  lower  level  corresponds  to  a  point  within  a 
given  system  state  at  the  higher  command  OCS.  Therefore, 
movement  toward  higher  hierarchical  levels  is  accompanied 
by  generalization,  increase  in  parameters'  abstraction,  of 
encoded  parameters.  It  is  clear  that  there  should  be  a  match 
between  the  control  level  and  its  detectors,  because  the  latter 
should  properly  describe  the  corresponding  space  state 
coordinates  of  the  controlled  object.  The  hierarchy  of 
detectors  is  created  in  such  a  way  that  lower  detectors 
become  controlled  objects  for  the  higher  detectors. 
The  above-mentioned  theoretical  conclusions  created  a 
foundation  for  theoretical  analysis  of  organization  of  the 
highest  brain  levels.  Cortico  -  basal  ganglia  -  thalamocortical 
loops  are  the  major  anatomical  substrate  of  the  highest  brain 
levels,  and  let  us  try  to  analyze  the  available  anatomical  and 
physiological  data  by  utilizing  the  language  of  control  theory. 
It  is  easy  to  identify  in  these  loops  corresponding  functional 
subdivisions  and  types  of  signals. 


3.  SKELETOMOTOR  CORTICO  -  BASAL 
GANGLIA  -  THALAMOCORTICAL  LOOP 

The  skeletomotor  circuit  is  largely  projected  on  the  putamen, 
which  receives  projections  from  the  motor  and  somatosensory 
circuits  (Fig.  2a).  In  addition  to  these  projections,  the 
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(cerebral  cortex) 
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model  of  object  behavior 
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controlled 
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Figure  2.  Structural  and  functional  organization  of  the 
skeletomotor  circuit,  a  -  "closed"  and  "open"  cortico  -  basal 
ganglia  -  thalamocortical  loops.  Abbreviations:  APA  - 
arcuate  premotor  area;  GPi  -  internal  segment  of  globus 
pallidus.  MC  -  motor  cortex;  PM  -  premotor  cortex;  SC  - 
somatosensory  cortex;  SMA  -  supplementary  motor  area;  SNr 
-  substantia  nigra  pars  reticulata;  tn  -  thalamic  nuclei,  b  - 
cortico  -  basal  ganglia  -  thalamocortical  circuit  is  a  control 
system  which  has  a  model  of  the  controlled  object. 

putamen  also  receives  projections  from  area  5,  from  lateral 
area  6  including  the  arcuate  premotor  area,  and  from  the 
supplamentory  motor  area.  While  the  most  prominent 
projections  of  each  of  these  cortical  areas  go  to  the  putamen, 
there  is  slight  encroachment  of  each  projection  upon 
neighboring  regions  of  the  caudate  nucleus.  Additional 
corticostriate  inputs  to  the  "motor"  circuit  from  other 
functionally  related  regions  -  precentral,  ventral  cingulate 
premotor  area,  the  supplamentory  somatosensory  area  and 
certain  parts  of  superior  and  inferior  parietal  lobules,  are  still 
in  question.  The  putamen  sends  topographically  organized 
projections  to  discrete  regions  of  the  globus  pallidus  (e.g., 
ventrolateral  two-thirds  of  both  the  internal  and  the  external 
segments)  and  to  caudolateral  portions  of  the  pars  reticulata 
of  the  substantia  nigra.  The  above  mentioned  internal 
pallidal  regions  and  substantia  nigra  send  topographic 
projections  to  specific  thalamic  nuclei  including  nucleus 
ventralis  lateralis  pars  oralis  (VLo),  lateral  part  of  nucleus 
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ventralis  anterior  pars  parvocellularis  (VApc),  lateral  part  of 
nucleus  ventralis  anterior  pars  magnocellularis  (VAmc),  and 
the  centromedian  nucleus  (CM).  The  motor  circuit  is  closed 
by  a  means  of  thalamocortical  projections  from  VLo  and 
lateral  VAmc  to  the  supplementary  motor  area  (SMA),  from 
lateral  VApc  (from  VLo  as  well)  to  premotor  area  (PM),  and 
from  VLo  and  CM  to  motor  cortex  (MC). 
Thus,  the  general  rule  of  connections  between  the  cortex  and 
the  basal  ganglia  may  be  formulated  in  the  following  way: 
each  part  of  the  basal  ganglia  that  comprises  a  specific  basal 
ganglia  -  thalamocortical  loop  receives  inputs  from  much 
bigger  regions  of  the  cortex  than  those  to  which  they  project 
their  signals. 

The  Basal  Ganglia  is  the  Major  Substrate  for  the  Model 

Closed  neuronal  loops  are  the  substrate  for  the  model. 
Therefore,  it  is  possible  to  redraw  Fig.  2a  in  the  other  way 
(Fig.  2b).  It  is  necessary  to  note  that  the  functional 
subdivision  shown  in  Fig.  2b  should  not  be  completely 
identified  with  anatomical  subdivision.  This  identification 
may  be  done  only  to  a  certain  extent  (see  Introduction).  But 
for  our  purpose  we  may  do  this  as  a  first  approximation  and 
consider  the  basal  ganglia  as  a  system  which  models  the 
controlled  object. 

It  is  not  difficult  to  determine  what  is  the  controlled  object 
for  the  controlling  system  of  the  skeletomotor  basal  ganglia  - 
thalamocortical  circuit.  It  is  the  body  of  the  animal  and  the 
environment.  This  model  describes  the  behavior  of  the  body 
and  the  environment  during  animal  movements. 
To  predict  the  behavior  of  the  object,  the  model  uses  the 
language  of  afferent  signals  that  enter  the  system.  For  the 
spinal  OCS,  it  was  the  language  of  peripheral  afferents.  At 
the  level  of  the  basal  ganglia  -  thalamocortical  circuit,  the 
situation  is  different.  Lower  OCSs  are  responsible  for 
different  motor  automatisms  and  the  problem  of  basic 
movement  coordination  is  solved  at  those  lower  levels. 
Various  motor  control  levels  such  as  initiating  systems  of  the 
brainstem,  the  cerebellum,  and  even  cortical  level  (motor 
cortex)  are  controlled  object  for  cortico  -  basal  ganglia  - 
thalamocortical  loop.  The  latter  is  a  hierarchical  control 
system.  Additionally,  cortical  area  to  which  the  basal  ganglia 
project  their  signals  receives  inputs  from  other  cortical  areas 
that  means  that  a  great  variety  of  cortical  detectors  (including 
very  complex  ones)  send  signals  to  this  area.  These  detectors 
determine  what  values  have  relevance  to  parameters 
describing  the  state  of  the  object  or  its  parts  for  any  given 
moment  in  time.  The  model  predicts  the  behavior  of  these 
detectors  as  well. 

Fig.  3  illustrates  how  different  levels  of  skeletomotor  cortico 
-  basal  ganglia  -  thalamocortical  loop  are  functionally 
interconnected  with  each  other.  Numerous  subloops 
subordinated  to  one  another  can  be  separated  within 
skeletomotor  cortico  -  basal  ganglia  -  thalamocortical  loop. 


The  higher  the  subloop  in  the  hierarchy,  the  more  abstract 
parameters  are  processed  by  this  loop.  Clearly,  each  subloop 
has  to  receive  corresponding  afferent  inputs  to  function 
properly.  This  is  why  all  types  of  afferent  information  have  to 
be  processed  by  different  cortical  regions  before  afferent 
signals  arrive  in  the  corresponding  subloop.  For  instance,  the 
information  from  the  skeletomotor  subloop  that  supplies  the 
motor  cortex  is  not  directly  intermingled  with  information 
from  the  cerebellum  that  arrives  in  the  sensorimotor  cortex. 
Cerebellar  projections  go  to  the  cortex  that  is  located  between 
motor  and  sensorimotor  cortices. 
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Figure  3.  Hierarchy  within  the  skeletomotor  loop. 
Dopaminergic  neurons  are  included  in  error  distribution 
system. 

The  model  for  which  the  basal  ganglia  -  thalamocortical 
circuit  is  the  substrate  is  used  in  two  ways.  First,  as 
mentioned  above,  it  is  used  during  execution  of  the 
movement  to  predict  the  transition  to  a  new  state.  Second,  it 
is  used  during  planning  of  a  movement  that  needs  the  model 
in  a  full  scale  because  of  a  lack  of  afferent  information  about 
future  states.  It  is  hard  to  imagine  this  process  without  using 
a  model.  Moreover,  a  cause-effect  model  is  not  constrained 
by  real  time  and  may  function  at  rates  faster  than  real  time 
that  are  necessary  for  multistep  planning.  It  is  obvious  that 
using  the  model  without  the  constraint  of  real  time  requires 
efferent  and  afferent  channels  to  be  cut  off  until  a  correct 
decision  is  found.  It  is  not  difficult  to  imagine  how 
deafferentation  or  deefferentation  can  be  done  at  this  level; 
different  inhibitory  mechanisms  can  be  used. 
The  reasoning  made  above  helps  to  better  understand  the 
circuitry  in  the  basal  ganglia  and  shows  the  way  to  future 
investigations.  Several  simple  suggestions  may  be  made 
about  basal  ganglia  circuitry.  A  few  examples  will  be 
discussed  below.  The  system  at  the  higher  level  has  to  "jump" 
from  one  state  to  another  while  performing  controlling 
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functions.  The  optimal  (easiest)  way  to  create  such  a  system 
is  to  build  it  on  the  basis  of  pacemaker  neurons  or  neurons 
possessing  bistable  properties.  It  is  easy  to  create  a  system 
which  is  able  to  switch  from  one  stable  state  to  another  while 
simultaneously  counting  time  intervals  by  using  simple 
trigger  elements,  i.e.  neurons  having  the  above  described 
properties.  For  instance,  it  is  well  known  that  such  neurons 
are  frequently  a  part  of  central  pattern  generator  circuitries  in 
many  animal  species.  They  model  the  period  of  rhythm 
generation  during  locomotion,  breathing,  etc. 
Such  neurons  should  be  included  in  the  basal  ganglia 
circuitry.  Otherwise  basal  ganglia  circuits  that  include 
inhibition  of  inhibitory  neurons  will  not  work.  Such 
processes  take  place  in  striatum,  GPe  (external  segment  of 
globus  pallisus),  GPi  (internal  segment  of  globus  pallidus), 
and  pars  reticulata  of  the  substantia  nigra.  Moreover,  there 
could  be  one  additional  interesting  mechanism:  more 
complex  circuit  triggers.  A  system  having  a  large  number  of 
such  triggers,  each  of  which  may  be  only  in  two  stable  states, 
can  encode  the  whole  variety  of  possible  states  of  the  object. 
Therefore,  a  hierarchy  of  states  may  be  encoded  so  that 
higher  level  triggers  change  less  frequently  than  lower  level 
triggers. 

Dopaminergic  Neurons  are  a  Part  of  An  Error  Distribution 
System 

It  was  implied  above  that  control  system  uses  the  model 
while  performing  its  function.  Such  systems  have  to  have  an 
error  distribution  system  in  order  to  properly  tune  the  model 
on  the  object.  It  is  easy  to  find  it  in  skeletomotor  cortico  - 
basal  ganglia  -  thalamocortical  circuit.  Error  distribution 
system  includes  dopaminergic  neurons  of  the  substantia  nigra 
and  adjacent  mesocorticolimbic  group  (Fig.  3).  The  complex 
organization  of  the  cell  body  subgroups,  the  one  located  in 
the  substantia  nigra  pars  compacta  and  the  other  in  the 
ventral  tegmental  area,  are  no  longer  defined  in  terms  of 
striatal  or  mesocorticolimbic  projections.  These  subgroups 
are  intermingled,  and  some  mesocorticolimbic  projections 
have  their  origin  in  the  substantia  nigra  and  vice  versa  [3]. 
This  view  is  in  good  agreement  with  numerous  experimental 
data.  Dopaminergic  neurons  of  pars  compacta  and 
mesencephalic  tegmentum  react  to  any  behavioral  or 
environmental  change,  i.e.,  to  stimuli  which  are  not 
predicted  by  the  model.  If  the  stimulus  can  be  predicted, 
situation  is  different.  For  instance,  in  conditioning  paradigm, 
dopaminergic  neurons  respond  to  unconditioned  stimulus  in 
the  beginning  of  learning.  Later,  as  the  animal  learns  the 
task,  the  cells  respond  to  conditioned  stimulus  that  can  not  be 
predicted  and  do  not  respond  to  unconditioned  one  [4], 
However,  they  will  fire  if  the  unconditioned  stimulus  is  not 
presented  at  its  previously  predicted  time  interval,  for 
instance,  the  stimulus  appears  earlier  than  expected.  Any 
previously  described  patterns  of  dopamine  neuron  firing 


could  be  misleading  in  terms  of  stimuli  that  activate  this 
neuronal  system  and  some  authors  suggested  that  signaling 
of  dopamine  neurons  predicts  future  reinforcement  [5]. 
Although  the  idea  of  prediction  is  still  the  major  feature  of 
their  model,  the  authors  ascribe  this  function  to  a  circuit  that 
includes  dopaminergic  neurons.  As  we  have  seen  above,  it  is 
more  appropriate  to  consider  the  whole  basal  ganglia  circuit 
as  a  substrate  for  predictions.  In  this  case,  a  preceding  firing 
of  dopaminergic  neurons  in  response  to  unpredicted 
conditioned  stimulus  can  play  a  role  of  tuning  the  basal 
ganglia  circuitry  on  correct  future  prediction  of 
unconditioned  stimulus. 

There  are  three  possible  locations  where  mismatch  signals 
between  model  and  real  flows  may  be  calculated.  This 
mismatch  signal  goes  to  dopaminergic  neurons  mentioned 
above.  First,  in  the  cortex,  from  which  signals  about 
mismatch  go  to  those  putamenal  and  caudate  neurons  which 
project  to  dopaminergic  neurons.  Second,  in  the  putamen  and 
the  nucleus  caudatus.  Third,  in  both  places.  But  it  is 
necessary  to  note  that  there  are  no  principle  differences 
between  these  possibilities.  In  all  cases  error  signal  resulting 
from  mismatch  between  model  and  real  flows,  will  reach 
dopaminergic  neurons. 

In  addition  to  the  described  above  mismatch  signals, 
dopaminergic  neurons  also  receive  less  numerous  initiating 
inputs  from  other  sources:  from,  entopeduncular  nucleus, 
dorsal  nucleus  of  raphe,  the  central  nucleus  of  the  amygdala, 
and  the  bed  nucleus  of  the  stria  terminalis.  There  are  also 
some  indications  about  direct  cortical  inputs  to  the  substantia 
nigra  [see  6].  The  existence  of  multiple  inputs  to  the  error 
distribution  system  does  not  contradict  to  the  theory.  They 
can  create  new  minimization  criteria  for  the  system. 
Thus,  mismatch  signal  computed  at  the  level  of  one  subloop 
goes  through  dopaminergic  neurons  to  the  model  of  the  same 
level  and  to  the  controlling  network  (corresponding  cerebral 
cortex)  of  higher  subloop. 

Learning  in  the  basal  ganglia  starts  when  initiating  signals 
become  larger  than  minimal  and  stops  when  they  are 
minimized.  It  is  possible  to  suggest  that  very  complex 
strategies  may  be  used  at  this  level  to  minimize  initiating 
signals.  The  strategy  of  random  search  may  be  used  in  early 
ontogenesis  when  there  is  no  knowledge  of  object  behavior. 
Later,  more  complex  and  more  advanced  learning  strategies 
may  be  used.  These  strategies  can  be  based  on  some  specific 
mechanisms,  for  instance,  memorization  of  informational 
context,  memorization  of  incorrect  decisions,  etc.  The 
complexity  of  cortical  and  basal  ganglia  networks  makes 
such  a  suggestion  reasonable.  Moreover,  in  this  system  error 
signal  possesses  a  sign:  Dopaminergic  influence  excites 
direct  pathway  from  the  putamen  to  the  globus  pallidus  and 
inhibits  the  indirect  one.  Clearly,  the  sign  accelerates 
significantly  the  process  of  learning,  and  fast  tuning  of  the 
model  on  the  object  becomes  possible.  It  was  already 
mentioned  that  a  hierarchy  of  parameters  exists  at  this  level, 
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and  it  is  possible  to  mention  the  simplest  analogy.  Suppose 
the  predictive  mechanism  is  built  on  the  basis  of  a  counter 
that  encodes  object  states.  In  this  case,  if  this  predictive 
mechanism  is  working  quite  well  it  will  be  necessary  to 
adjust  only  lower  digits  by  changing  thresholds  of  their 
switches. 

Parkinson 's  Disease 

Parkinson's  disease  is  one  of  the  most  studied 
neurodegenerative  disorders.  Clinical  studies  and 
experimental  data  from  animal  models  of  parkinsonism  have 
convincingly  shown  that  death  of  dopaminergic  neurons  of 
the  substantial  nigra  leads  to  Parkinson's  disease.  Akinesia, 
muscular  rigidity  and  tremor  are  the  main  symptoms  of  this 
disease  [see  7,  8]. 

Thus,  based  upon  the  proposed  theoretical  approach  it  is 
possible  to  conclude  that  Parkinson's  disease  is  the 
consequence  of  disorders  of  learning  processes  in  the  basal 
ganglia  (in  its  skeletomotor  loop),  a  disorder  of  error 
distribution  system.  As  a  result  of  this,  the  model  incorrectly 
predicts  the  state  of  the  object  in  these  patients  (Fig.  4) 


normal  subject 


Parkinson's  disease 


object  state 

Figure  4.  Probability  of  the  controlled  object  to  be  in  a 
definite  state.  When  the  model  functions  correctly,  the 
prediction  coincides  with  the  probability  distribution 
generated  by  real  afferent  flow.  In  case  of  Parkinson's 
disease,  predicted  and  real  probability  distributions  do  not 
coincide  when  the  model  functions  incorrectly. 

Let  us  analyze  how  incorrect  prediction  leads  to  symptoms  in 
Parkinson's  disease.  It  is  clear  that  symptoms  depend  on  what 
parameters  of  the  object  are  predicted  incorrectly  (what  digits 
of  a  "counter",  lower  or  higher,  are  set  improperly).  When 
the  state  of  antagonistic  muscles  is  predicted  incorrectly, 
there  will  be  rigidity  or  tremor.  In  the  latter  case,  it  looks  like 
overregulation  when  the  model  always  misses  the 
equilibrium  point.  More  complex  explanations  should  be 
used  for  bradykinesia.  In  this  case  higher  level  subloops  are 
involved,  and  the  prediction  of  the  model  differs  significantly 


from  the  real  state  of  the  object.  The  model  may  predict  also 
several  states  with  equal  probabilities.  Therefore,  it  takes 
much  more  time  for  the  system  to  choose  from  these  states,  to 
decide  which  of  them  is  more  probable. 
Let  us  consider  one  of  the  simplest  analogies  -  interaction  of 
antagonistic  reflexes  at  the  spinal  level.  It  is  well  known  that 
when  receptive  fields  for  antagonistic  reflexes  are  stimulated 
simultaneously  it  is  impossible  to  predict  which  of  these 
reflexes  will  be  evoked.  The  latent  period  of  the  evoked  reflex 
is  usually  much  longer  than  in  the  normal  situation. 
Moreover,  sometimes  none  of  these  reflexes  appear  at  all. 
Severe  akinesia  is  the  result  of  an  absolutely  unreal 
prediction  of  the  model. 

The  most  curious  problems  unearthed  by  observations 
generated  by  functional  neurosurgery  for  Parkinson's  disease 
still  remain  unclear.  For  example,  why  do  partial  lesions  of 
particular  structures  within  the  neural  network  of  the  basal 
ganglia  -  thalamocortical  loop,  globus  pallidus  pars  interna 
and  some  thalamic  nuclei,  improve  symptoms.  This 
observation  contradict  common  sense:  How  is  it  possible  to 
improve  the  network  function  by  destroying  its  part? 
Moreover,  chronic  stimulation  of  some  of  the  same  structures 
produce  identical  effect.  This  is  a  fundamental  paradox  that 
until  now  has  not  been  reasonably  explained  by  reflex  theory 
or  the  balance  of  excitation  and  inhibition  that  is  postulated 
to  occur  in  the  basal  ganglia.  In  the  case  of  stimulation,  two 
possible  mechanisms  exist:  (1)  Stimulation  produces  a 
functional  block  in  regions  immediately  adjacent  to  the 
electrode  tip;  and  (2)  spreading  of  influences  to  other  brain 
regions  occurs  via  both  fibers  passing  through  the  stimulated 
region  and  axons  of  neurons  excited  during  stimulation. 
However,  these  latter  influences  do  not  provide  the  system 
with  meaningful  information  for  signal  processing.  Thus,  the 
second  mechanism  simply  introduces  noise  into  the  network. 
Therefore  an  obvious  question  appears:  Why  does  noise 
added  to  such  a  system  improve  its  function?  At  last,  it  is 
well  known  that  such  parkinsonian  symptom  as  tremor  can 
be  effectively  removed  by  placing  a  lesion  in  thalamic 
nucleus  that  conveys  information  from  the  cerebellum  to  the 
cortex,  that  means  that  the  real  afferent  flow  is  changed  by 
this  procedure. 

From  the  point  of  view  of  the  developed  theory  explanation 
of  these  medical  facts  can  be  the  following.  When  the 
network  generating  model  afferent  flow  is  partially 
destroyed,  probability  distribution  of  possible  object  states 
become  lower  and  wider.  Obviously,  predicted  states  can 
partly  overlap  with  possible  real  states  of  the  object  after  this 
procedure.  As  a  result,  system  does  not  find  an  error  in  its 
prediction  anymore,  and  does  not  try  to  correct  the  object 
position  in  it  state  space.  Situation  is  symmetrical  in  case  of 
placing  the  lesion  in  the  network  processing  afferent  flow. 
Model  flow  is  not  changed.  What  is  changed  is  the 
probability  distribution  predicted  by  real  afferent  flow,  and 
two  distributions  generated  by  model  and  real  flows  start  to 
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overlap.  The  latter  leads  to  alleviation  of  parkinsonian 
symptoms,  for  instance,  tremor. 

An  explanation  for  the  effects  of  chronic  stimulation  becomes 
transparent  after  what  was  said  above.  Functional  block 
works  as  partial  destruction  of  the  structure.  Therefore,  the 
mechanism  of  chronic  stimulation  is  partly  similar  to  placing 
a  lesion.  Concerning  noise,  another  possible  factor  that  can 
work  in  the  case  of  chronic  stimulation,  it  is  well  known  that 
adding  noise  effectively  helps  the  system  to  slide  down  the 
error  surface  to  its  global  minimum,  much  like  shaking  an 
uneven  declining  surface  helps  a  ball  slide  down  to  the 
lowest  point  on  that  surface.  While  the  results  of  chronic 
stimulation  are  similar  to  the  clinical  results  of  placing  a 
lesion,  they  do  not  produce  immediate  neural  tissue 
destruction.  Furthermore,  one  can  stop  stimulating  tissue  by 
turning  off  the  source  of  current,  thus  producing  a  reversible 
type  of  functional  lesion. 

One  critical  aspect  of  this  explanation  should  be  always  kept 
in  mind.  Placement  of  a  lesion  or  chronic  stimulation  within 
an  OCS  network  improves  symptoms,  but  does  it  make  the 
controlling  system  function  normally?  The  answer  is  no! 
After  the  lesion  has  been  placed,  the  system  is  effectively 
tricked  and  does  not  find  any  errors  in  its  prediction. 
However,  normal  function  of  the  controlling  system  is  not 
restored.  Therefore,  functional  neurosurgical  procedures  in 
PD  should  be  considered  merely  as  palliative,  symptomatic 
interventions  and  not  curative  because  they  do  not  stop  the 
fundamental  degenerative  process.  Only  restoration  of  the 
structural  integrity  of  the  system  by  rewiring  of  lost 
connections,  or  at  least  prevention  of  further  dopaminergic 
cell  loss,  will  be  the  most  effective  form  of  treatment  for  PD. 
Therefore,  functional  neurosurgical  procedures  have  to  be 
considered  as  methods  of  treatment  based  on  holographic 
properties  of  biological  neural  networks,  that  means  that 
system  can  function  being  partially  distructed.  Like  in 
holography  when  complete  three-dimensional  image  of  an 
object  can  still  be  reproduced  after  partially  destruction  of  the 
photographic  plate,  but  with  a  lesser  resolution. 

4.  OTHER  CORTICO  -  BASAL  GANGLIA  - 
THALAMOCORTICAL  CIRCUITS 

Skeletomotor  loop  described  above  controls  movement  of 
such  objects  as  the  animal's  body  and  the  environmental 
objects.  Oculomotor  loop  is  principally  very  similar  to  the 
skeletomotor  loop.  Obviously,  if  the  level  of  parameters' 
abstraction  in  such  a  system  as  the  cortico  -  basal  ganglia  - 
thalamocortical  circuit  became  sophisticated  enough  to 
perform  "movement"  of  abstract  objects  in  an  abstract  state 
space  out  of  real  time,  there  would  clearly  be  the  creation  of  a 
principally  new  functional  feature,  and  the  process  of 
elaborately  detailed  multistep  planning  would  be  possible, 


i.e.,  the  capacity  for  thought.  Prefrontal  loops  are  the 
substrate  where  such  operations  take  place. 
The  above-described  theoretical  approach  can  be  utilized  for 
explaining  the  functional  role  of  the  cingulate  gyrus  and  its 
cortico  -  basal  ganglia  -  thalamocortical  loop.  The  latter  loop 
is  often  referred  as  the  "limbic"  loop.  It  is  a  part  of  the  limbic 
system.  According  to  numerous  experimental  data,  the  limbic 
system  controls:  Activities  essential  for  the  self-preservation 
of  the  individual,  including  emotional  ones  (e.g.,  feeding 
behavior  and  aggression,  behaviors  that  are  often 
accompanied  by  emotions);  activities  essential  for  the 
preservation  of  the  species  (e.g.,  mating  behavior, 
procreation,  and  the  care  of  the  young);  visceral  activities 
associated  with  both  of  the  above  and  numerous  other 
activities  of  the  hypothalamus;  mechanisms  for  memory. 

REFERENCES 

[1]  Alexander,  G.E.,  DeLong,  M.R.  and  Strick,  P.L.  (1986): 
Parallel  organization  of  functionally  segregated  circuits 
linking  basal  ganglia  and  cortex.  Ann  Rev  Neurosci  9:  357- 
381 

[2]Baev,  K.V.,  and  Shimansky,  Yu.P.  (1992):  Principles  of 
organization  of  neural  systems  controlling  automatic 
movements  in  animals.  Progr  Neurobiol  39:  45-1 12 
[3]  Le  Moal,  M.,  and  Simon,  H.  (1991):  Mesocorticolimbic 
dopaminergic  network:  functional  and  regulatory  roles. 
Physiol  Rev  71:  155-234 

[4]  Ljungberg,  T.,  Apicella,  P.,  and  Schultz,  W.  (1992): 
Responses  of  monkey  dopamine  neurons  during  learning  of 
behavioral  reactions.  J Neurophysiol  67:  145-163 
[5]  Houk,  J.C.,  Adams,  J.L.,  and  Barto,  A.G.  (1995):  A 
model  of  how  the  basal  ganglia  generate  and  use  neural 
signals  that  predict  reinforcement.  In:  Models  of  Information 
Processing  in  the  Basal  Ganglia,  Houk,  J.C.,  Davis,  J.L.,  and 
Beiser,  D.G.,  eds.  Cambridge,  Massachusetts,  London, 
England:  Bradford  Book,  The  MIT  Press,  pp.  249-270 
[6]  Brodal,  A.  (1981):  Neurological  anatomy.  In  relation  to 
clinical  medicine.  New  York,  Oxford:  Oxford  University 
Press 

[7]  Bergman,  H.,  Wichmann,  T.,  and  DeLong,  M.R.  (1990): 
Reversal  of  experimental  parkinsonism  by  lesions  of  the 
subthalamic  nucleus.  Science  249:  1436-1438 
[8]  Greene,  K.A.,  Marciano,  F.,  Golfinos,  J.G.,  Shetter,  A.G., 
Lieberman,  A.N.,  and  Spetzler,  R.F.  (1992):  Pallidotomy  in 
levodopa  era.  Adv  Clinical  Neurosci  2:  257-281 


504 


Learning  How  To  Know: 
Semiotics  and  Multiscale  Cybernetics 


A.  Meystel 
Drexel  University 
National  Institute  of  Standards  and  Technology 


Abstract.  In  this  paper  a  comparison  is  made  of 
Semiotics  and  Cybernetics.  It  was  found  that 
semiotics  can  be  considered  a  cybernetics  of  the  m- 
th  order.  The  latter  can  be  characterized  by  the 
phenomena  of  self-reflection  and  multiscale 
representation.  A  syllabus  for  the  corresponding 
course  is  outlined. 

Key  words:  cybernetics,  m-th  order,  multiscale 
representation,  reflection,  semiotics 

Knowledge  Acquisition.  Intelligent  systems 
acquire  knowledge  through  preprogramming  and 
learning.  These  processes  deal  with  information 
represented  in  signs  (labels,  elementary  codes)  and 
symbols  (generalizations  of  signs).  The  synapses  of 
our  nervous  system  generate  signs  and  symbols, 
and  are  signs  and  symbols  themselves.  The 
formation  and  use  of  signs  and  symbols  is  analyzed 
within  the  discipline  of  semiotics.  In  semiotics,  the 
central  process  under  consideration  is  called 
se miosis.  This  process  takes  the  form  of  an 
elementary  loop  of  acquisition  and  processing 
which  represents  both  functioning  and  learning. 

Semiosis  is  combined  of  the  following 
subprocesses  [1,2]: 

a)  encoding  of  the  sensations  (sets  of  signals  from 
transducers  interpretable  as  states  of  the  reality) 
by  using  elementary  signs 

b)  associating  encoded  sensation  with  codes  of 
actions  (sets  of  phenomena  which  entail  the 
commands  generated  by  the  system) 

c)  constructing  strings  of  codes  for  "states-actions- 
..."  (S-A-S-A...)  stored  in  the  memory 

d)  assigning  to  these  strings  of  signs  values  of 
goodness  J  interpretable  under  specific  goals  G, 
which  allow  interpretation  of  "experiences"  E: 
[G,(S-A-...)]-->J 

e)  discovering  classes  of  experiences  {E}, 
assigning  labels  (signs,  symbols)  to  them, 
assigning  signs  to  their  components  (generation 
of  the  concepts  of  objects  and  actions) 

f)  forming  hypotheses  of  new  behavioral  rules 
using  previously  stored  results  of  prior  processes 
of  semiosis. 


Statements  "a"  through  "f  can  be  formed 
only  for  a  particular  "level  of  resolution"  (or 
"granularity".)  The  level  is  characterized  by  two 
major  constraints  in  focusing  attention  (from  above 
and  from  below.)  The  first  constraint  corresponds  to 
the  scope  of  view,  the  second  -  to  the  minimum 
"granule"  (the  "indistinguishability  zone" 
introduced  in  [3].) 

All  semiotic  processes  contain  an  operation 
of  generalization  which  is  typical  for  all  processes 
of  knowledge  acquisition.  The  need  in 
generalization  is  determined  by  the  orientation 
toward  complexity  reduction.  All  algorithms  of 
generalization  consist  of  procedures  of  grouping, 
focusing  attention,  and  combinatorial  search  (see, 
GFACS  in  [2].)  A  special  feature  of  the  semiotic 
system  is  that  GFACS  is  being  applied  to  the 
results  it  produces  at  its  output.  Thus,  the 
multigranular  systems  emerge. 

Cybernetics  vs  Semiotics.  The  loop  of  semiosis 
has  a  striking  similarity  with  a  cybernetic  views  of  the 
system.  The  sequence  "a"  through  "f  describes 
computational  activities  at  a  singe  level  of  granularity 
in  a  system  for  knowledge  acquisition,  generalization, 
and  utilization  (e.g.  "action")  i.e.  in  a  semiotic  system. 
Unlike  a  classical  cybernetic  system  the  semiotic 
system  generalizes  the  acquired  information  and 
processes  it  again  many  times,  thus  building  up  a 
system  of  multiresolutional  nested  loops.  Each  of 
these  loops  is  a  classical  cybernetic  loop.  A  system  of 
these  loops  allows  for  considering  m  levels  of 
granularity. 

In  classical  cybernetics,  the  issues  of  granularity 
are  considered  to  be  a  source  of  special  effects  (e.g. 
nonlinearity)  but  does  not  interfere  with  the 
architecture  of  the  "loop":  "...vibrations  of  the 
molecules  in  the  nucleic  acid  complexes  may  be 
responsible  for  their  behavior  as  organized  systems 
...in  dynamical  systems  of  a  much  coarser  texture  the 
vibration  properties  play  a  large  part  in  their 
organization.  This  is  true  both  with  biological  and 
with  engineering  systems."  [4] 
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It  is  typical  for  the  analysis  of  semiotic 
system  to  be  focused  upon  the  meaning  of  the 
system  and  the  process:  after  answering  the  What? 
question,  Why?  is  the  question  that  the 
semiotician  attempts  to  answer.  In  cybernetics,  the 
same  elementary  signal  processing  loop  is  a  source 
of  different  insights.  How?  and  How  to  control? 
are  the  typical  questions  for  the  cyberneticist .  As  a 
result,  a  semiotic  system  turns  out  to  be  equipped 
by  the  means  of  generating  feedforward  commands 
which  contain  the  goals  and  tasks,  and  it  becomes  a 
conventional  cybernetic  system. 
Levels  of  various  granularity.  In  other 
words,  the  cybernetic  loop  of  system  functioning  is 
almost  equivalent  to  the  semiotic  loop  of  semiosis 
with  the  following  difference:  the  first  describe 
control  processes  in  physical  terms  while  the 
second  analyzes  the  processes  in  the  multigranular 
sign-symbol  domain.  A  more  serious  difference  is 
in  the  emphasis:  since  semiotics  is  pursuing  the 
meaning  of  the  system,  the  goal  is  always 
questioned,  and  another  loop  should  be  discussed  at 
which  the  goal  is  just  a  variable.  This  new  loop  (a 
coarser  granularity  loop)  has  a  different  goal,  and 
the  goal  of  the  first  loop  is  just  a  variable  for  this 
new  loop.  The  pursuit  of  meaning  leads  us  to 
discovery  of  loops  of  different  granularity. 
(Actually,  emergence  of  the  objects  and  processes 
of  coarser  granularity  was  ingrained  in  the  stage  "g" 
of  the  description  of  the  process  of  learning. 

When  m  levels  of  granularity  (resolution, 
scale)  emerge  as  a  result  of  learning  in  the  semiotic 
system,  similar  levels  emerge  in  the  corresponding 
cybernetic  system.  A  cybernetic  system  of  the  m-th 
order  emerges,  or  multiscale  cybernetics.  The 
multiscale  cybernetic  systems  can  be  characterized 
by  a  multigranular  (m-loops)  semiosis  which 
invokes  a  variety  of  interesting  and  important 
phenomena  including  reflection  ("reflexia"),  self- 
organization  and  emergence  (usually  attributed  to 
the  situation  of  "complexity")  and  others. 

Can  we  talk  about  virtual  equivalence  of 
semiotic  and  cybernetic  systems  just  by  considering 
a  single  level  loop?  We  can  but  the  powerful 
concepts  of  the  semiotic-cybernetic  scientific 
paradigm  will  be  wasted.  We  do  not  need  the  tools 
and  techniques  of  either  in  a  single  loop  case.  On 
the  other  hand,  if  learning  is  involved,  if  planning 
is  of  interest,  if  processes  of  self-organization  and 
self-description  are  of  significance,  no  single  loop 
(single  level)  discussion  can  be  productive:  such  a 
simplification  will  have  hidden  mistakes. 

At  a  single  level  of  resolution,  the  relation 
between  the  constructive  mathematics  and  non- 


standard analysis  can  be  demonstrated.  Both  are 
characterized  by  illustrative  paradoxes  which 
demonstrate  that  movement  to  another  level  is 
required. 

The  joint  semiotic-cybernetic  paradigm 
allows  us  to  address  all  these  issues  and  explore  the 
research  developments  implied,  and  the  applications 
of  the  expected  theoretical  results. 

Semiotic  Architectures.  A  long  time  ago 
people  discovered  that  intelligent  systems  and 
learning  processes  depend  on  architectures  of  sign- 
symbol  processing.  These  architectures  determine 
how  intelligent  systems  perceive  the  world, 
recognize  objects  in  it  and  interpret  the  results  of 
recognition  for  the  subsequent  actions.  The 
multigranular  architectures  of  our  intelligence  and 
the  multigranular  representations  generated  by  our 
intelligence  are  affected  by  the  processes  of  single 
and  multi-level  self-reflection. 

Architectures  of  sign  processing  affect  the 
ways  intelligent  systems  behave  in  the  world.  They 
depend  on  the  way  these  systems  think  they 
behave.  Careful  analysis  demonstrates  that  all  our 
knowledge  depends  on  our  self-reflection.  Our  own 
brain-architectures  are  reflected  in  all  the  knowledge 
we  acquire,  even  when  we  think  that  we  analyze  an 
electron,  a  spray  casting,  or  a  bacterium.  All  our 
representations  reflect  the  architecture  of  our  brain. 
Architectures  of  intelligent  systems  incorporate  the 
recursion  of  self-reflection  which  becomes  crucially 
important  for  describing  interactions  among 
multiple  intelligent  systems. 

In  intelligent  systems  knowing  always 
presumes  changing  resolution  or  granularity  or 
scale  (which  is  the  same).  Therefore,  learning 
brings  on  changing  the  scale  of  representation  and 
subsequent  changes  in  processes  of  decision 
making,  planning,  and  error  compensation  within 
this  scale  of  representation.  Each  next  step  of 
learning  leads  to  the  generation  of  a  next  order  of 
cybernetic  analysis.  Once  a  higher  level  (level  of 
coarser  granularity)  is  achieved,  all  previously 
known  (finer  resolution)  subsystems  are  subsumed 
by  (aggregated  within  the  model  of)  the  level  of 
coarser  granularity. 

Cybernetics  was  a  step  in  learning  how  to 
know.  We  have  discovered  the  phenomenon  of  self- 
organization  via  planning  and  error  compensation 
during  functioning.  Self-organization  was  a  step 
toward  self-reflection.  With  realization  of  the 
phenomenon  of  self-reflection  we  started  paying 
more  attention  to  the  techniques  of  interpretation. 
The  results  of  our  interpretation  are  strongly 
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affected  by  our  "self  hidden  in  the  techniques  of 
our  intelligence.  Then,  the  cybernetics  of  the 
second  order  emerged  (this  term  was  introduced  by 

H.  von  Foerster  to  describe  the  level  produced  by 
self-reflection). 

Socio-scientific  Aspect  of  Knowledge 
Acquisition.  People  were  always  respectful  to  and 
comfortable  with  activities  related  to  knowledge 
acquisition  and  organization,  all  of  them  starting  from 
broadminded  Aristotle  and  ending  with  narrowly  trained 
professionals  of  the  XX  Century.  Problems  related  to 
knowledge  emerged  when  we  entered  the  era  of 
intelligent  systems: 

a)  we  realized  that  within  a  single  problem 
we  should  address  modeling  of  the  system  at  several 
levels  of  resolution  (granularity,  scale) 

b)  we  started  analyzing  processes  of  self- 
reflection  when  in  addition  to  the  world  produced  by 
sensations  we  encountered  a  world  produced  by  our 
own  representations  which  included  representation 
of  ourselves. 

How  Is  Semiotics  Related  To  Other 
Disciplines.  The  gist  of  the  movement  toward 
Unified  Science  was  rejected  by  the  reality  of  socio- 
scientific  process.  However,  the  essence  of  it 
cannot  be  discarded  because  of  the  following 
reasons: 

•  all  sciences  are  based  upon  the  system  of 
symbols  and  their  transformation  induced  by  our 
brain  architecture 

•  thus  all  phenomena  of  the  world  perceived 
and  reasoned  by  us  are  preprocessed  so  as  to  fit 

to  be  interpreted  by  these  architectures 

What  makes  the  natural  sciences  distinct  is 
something  that  can  be  called  "legacy  issues":  this  is 
the  way  how  people  in  the  particular  socio- 
scientific  niche  are  used  to  talk  about  their 
subdomain  entities,  so  do  not  interfere  with  this 
legacy.  Nevertheless,  it  seems  to  be  prudent,  to 
study  the  invariance  of  the  natural  sciences  before 
we  embark  on  doubtful  enterprise  of 
unconditionally  solidifying  obstacles  for 
generalization  which  are  copiously  collected  within 
the  bulk  of  legacy  issues.  The  following  temporal 
sequence  of  knowledge  acquisition  can  be 
recommended  in  the  imaginary  syllabus  of  the 
course  on  "Semiotics  of  Knowledge": 

I.  Science  of  Signs  Formation  which  includes: 

•  Energy  and  Complexity 

•  Entity  formation  techniques 
Grouping,  Focusing  Attention, 
Combinatorial  Search  (GFACS) 


•  Generalization  and  formation  of  the  entity- 
relational  structure 

•  Continuity  and  discretization 

•  Granularity  and  Resolution 

•  The  phenomenon  of  recognition 

•  Equivalence,  Resemblance,  Similarity 

A  significant  issue  (among  others)  is  the  skill  and 
the  habit  of  notations:  they  carry  with  themselves 
premises,  assumptions  and  prejudices. 

2.  Science  of  representation  including  Syntactics 
and  Semantics. 

3.  The  Phenomenon  of  Interpretation 

4.  Sciences  of  Generalized  Reasoning  including 
Logic  ,  Mathematics,  and  Computer  Science 

5.  Knowledge  Engineering  and  Epistemology 

6.  Science  of  Value  Judgment  including  methods  of 
evaluation  and  optimization  of  Goodness  and 
Beauty. 

7.  [Intuitive]  World  Modeling  in  Art  and  Literature 

8.  [Weak]  World  Modeling  (Experimental 
Knowledge  Classification)  in  Zoology,  Botany, 
Medicine,  Geography,  History,  Economics 

9.  [Strong]  World  Modeling  (Architecture  Analysis) 
with  the  legacy  disciplines  of  Physics, 
Chemistry,  and  Biology 

10.  Problem  identification  and  creative  solving: 
Theory  of  Design 

1 1 .  Philosophy  and  Psychology:  Theory  of  Mind 

12.  Control  Theory 

13.  Information  Theory 

14.  Science  of  Learning  including  Process  of 
Semiosis,  Adaptation,  Evolution,  Machine 
Learning 

15.  Engineering 

16.  Life  Science,  Evolution,  Genetics 

17.  Large  Complex  Systems  (including  Sociology 
and  others) 

18.  General  Pragmatics  and  Symbol  Grounding 

A  special  role  of  Control  Theory.  In  60s,  the 
ideas  surfaced  that  theory  of  control  should  go  further 
than  just  constructing  mathematical  models  of  dynamic 
systems  (K.  S.  Fu,  G.  Saridis.)  The  annual 
symposium  on  intelligent  control  has  been  initiated  in 
1984  when  it  became  clear  that  Control  Theory 
benefits  from  unified  application  of  techniques  and 
ideas  typical  for  Operation  Research  (OR)  with  its 
search,  and  Artificial  Intelligence  (AI)  with  its  logics 
and  linguistics.  In  the  course  of  events,  it  became  clear 
that  there  is  more  to  the  situation  than  just  merger  of: 

Control  Theory+OR+AI. 
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Some  groups  persistently  associated  intelligence 
with  neural  networks,  fuzzy  set  theory,  and 
evolutionary  computation.  This  wing  of 
control/computational  science  exists  now  as  a 
community  of  Soft  Computing  for  whom  the  body  of 
intelligent  control  is  a  sum: 

NN+Fuzzy  Sets+Evolutionary  Computation. 

Finally,  many  people  believe  that  it  is  very 
important  to  determine  everything  by  the 
architectural  issues  which  makes  problems 
dependent  of  the  cases  of  intelligent  systems  known 
from  the  biology  of  nervous  systems  in  living 
creatures,  laws  of  language,  physics,  and  technical 
solutions  in  autonomous  robotics.  Thus,  a  new  set 
of  interrelated  domains  was  added  to  the  area  of 
Intelligent  Control: 

Architectures+Brain+Semiotics 

One  can  see  that  these  three  communities  are 
not  independent,  they  rather  overlap  a  lot.  It  is 
tempting  to  say  that  the  truth  is  in  the  middle. 
However,  in  this  situation  finding  a  "middle"  is  a 
formidable  problem  by  itself. 

Conclusions.  Various  disciplines,  including 
knowledge  engineering,  artificial  intelligence, 
intelligent  control,  and  semiotics,  are  stages  in  the 
exciting  search  within  and  traveling  toward 
multiresolutional  cybernetics,  or  cybernetics  of  the  m- 


th  order.  Such  systems  produce  representations  in 
multiple  scales,  supplement  control  with  self- 
organization  and  throw  us  into  a  totally  new  world,  a 
multigranular  world,  in  which  we  always  live  even 
though  we  do  not  pay  too  much  attention  to  its 
multiresolutional  nature. 

Various  artifacts  of  life  and  technology 
should  be  discussed  from  the  point  of  view  of 
multiscale  cybernetics.  In  engineering  problems  and 
poetry,  scientific  research  and  visual  arts, 
everywhere  we  will  continue  to  discover  the 
multiresolutional  world  of  semiotic-cybernetic 
models. 
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ABSTRACT 

As  the  discipline  concerned  with  communication  and 
signification  in  their  broadest  senses,  semiotics  has  much  to 
offer  for  the  development  of  truly  intelligent  systems.  This 
paper  explores  the  relevance  of  semiotics  to  intelligent 
control.  I  focus  in  particular  on  the  role  of  the  semiotic  sign 
in  control  systems,  distinguishing  several  types  of  signs 
involved.  These  include  signs  used  to  communicate  system 
states,  human  user  inputs,  and  environmental  variables  to  the 
control  system,  signs  used  to  communicate  control  actions  to 
the  system,  and  signs  used  internally  by  the  control  system 
for  its  functioning.  Viewing  these  various  flows  of 
information  as  general  signification  activities  broadens  our 
conventional  notion  of  control,  and  renders  it  more  relevant  to 
large-scale,  complex  processes.  Several  technologies  that  can 
contribute  to  the  "semioticization"  of  control  are  noted: 
multiscale  representations,  fuzzy  logic,  learning  and 
adaptation,  computational  linguistics,  and  classical  control. 

Keywords:  control  systems,  multiresolution  models, 
fuzzy  logic,  learning  systems,  computational  linguistics. 

1.  Visions  of  Intelligent  Control 

The  successes  that  have  been  achieved  with  our  existing 
control  technology  over  the  last  few  decades  are  a  mixed 
blessing.  On  the  one  hand,  control  technologists  can  take 
justifiable  pride  in  contributing  to  the  technological  progress 
of  our  societies.  On  the  other  hand,  further  research  in 
classical  control  appears  to  be  a  path  of  incremental, 
evolutionary  improvements.  Yet  challenging  control 
problems  are  still  far  from  being  solved — practical 
learning/adaptive  systems  for  real-world  applications  are  still 
far  from  reality,  to  take  one  example.  The  interest  in 
"intelligent  control"  reflects  both  the  recognition  of  the 
outstanding  research  needs  in  control  and  the  view  that 
classical  control  methods  cannot  by  themselves  satisfy  these 
needs.  There  remains,  however,  a  considerable  mismatch 
between  the  motivation  for  intelligent  control  and  much  of 
the  research  that  is  currently  being  conducted  under  its  aegis. 
Single-loop  neural  and  fuzzy  controllers  are  only  a  small  part 
of  the  solution — at  best. 

One  critical  limitation  of  most  research — whether  in  a 
classical  or  intelligent  control  vein — is  the  simplistic,  highly 
restricted  code  theory  that  is  assumed  to  underlie 
communication  and  signification  in  control  systems.  Aspects 


of  this  communication  include:  a  user's  command  to  the 
control  system,  the  representation  of  sensory  measurements, 
controller  design  parametrization,  system  models  as  used  in 
the  controller,  and  the  control  action  itself.  In  each  of  these 
cases,  current  control  systems  only  permit  simple  types  of 
information  structures,  such  as  numerical  vectors. 

As  the  discipline  concerned  with  communication  and 
signification  in  their  broadest  senses,  semiotics  has  much  to 
offer  control  science  and  engineering.  This  "position  paper" 
attempts  to  demonstrate  this  relevance.  The  next  section 
discusses  how  a  control  system  can  be  viewed  as  a  semiotic 
system,  the  latter  being  taken  to  mean  a  system  in  which  the 
Peircean  triad  of  object/sign/interpretant  is  realized.  Section  3 
outlines  several  possible  roles  for  signification,  noting  how 
they  extend  our  current  thinking  in  control.  Semiotics  is  not, 
at  least  not  yet,  a  technology  with  packaged  solutions  to 
offer,  but  several  technological  fields  have  significant 
connections  with  it.  Some  of  these  are  briefly  discussed  in 
Section  4.  Section  5  contains  some  concluding  remarks. 

This  paper  is  speculative.  It  presents  no  crisp  technical  ideas. 
I  am  hoping  that  it  can  serve  as  grist  for  discussing  the  role  of 
semiotics  in  intelligent  systems  in  general,  and  intelligent 
control  systems  in  particular.  The  majority  of  the  exposition 
is  abstract,  but  I  occasionally  refer  to  some  examples.  These 
are  taken  from  domains  I  have  some  familiarity  with — 
industrial  process  control  and  building  control — but  the 
exposition  is  intended  to  apply  to  control  systems  generally. 

2.  Roles  for  Semiosis  in  Intelligent 
Control  Systems 

I  discuss  in  this  section  how  a  control  system  can  be  viewed 
as  a  semiotic  entity,  placing  particular  emphasis  on  the  role 
of  the  sign  as  the  linchpin  of  semiosis.  The  notion  of  the 
sign  has  been  adapted  considerably  over  the  history  of 
semiotics,  and  it  is  relatively  recently  that  the  possibility  of 
artificial  entities  generating  and  interpreting  signs  has  been 
acknowledged.  The  very  thought  that  a  control  system  can 
engage  in  semiosis  may  still  seem  a  novelty  to  some. 

Nevertheless,  a  semiotic  explication  of  a  control  system  can 
be  suggested:  A  control  system  receives  signs  from  its 
environment,  interprets  and  processes  them  according  to 
models  explicitly  or  implicitly  embedded  in  it,  and  produces 
other  signs.  The  question  arises:  What,  if  any,  advantage  is 
gained  by  this  potentially  confusing  reformulation  over  our 
conventional  understanding  and  articulation  of  a  control 
system? 
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For  the  lowest  level  of  controllers — which  comprises  the 
majority  of  them — the  benefits  may  appear  to  be  few.  I  do 
not,  for  example,  see  how  our  understanding  of  the  PID 
algorithm  can  be  improved  given  the  semiotic  insight  relative 

to  its  explication  as  u  =  Kce(t)  +  Kije(t)dt  +  Kd^- .  1  Even  for 

dt 

loop  control,  though,  the  core  algorithm  is  but  one  piece  of 
the  puzzle.  The  functioning  of  a  PID  controller  also  requires 
integral  windup  protection,  a  bumpless  transfer  mechanism, 
tuning  procedures,  manual  override  capability,  etc.  These 
functions  may  be  ancillary,  but  the  complexity  of  most 
exceeds  that  of  the  controller's  "primary"  element. 

In  this  larger  picture,  a  semiotic  perspective  is  meaningful, 
and  a  semiotic  triangle  associated  with  detecting  a  poorly 
tuned  process  is  presented  in  Fig.  1 .  One  manifestation  of  a 
poorly  controlled  process  is  high  variance  in  the  process 
output,  here  assumed  to  be  temperature.  In  the  figure,  this 
sign  has  an  interpretant,  a  consequence  of  a  human  operator 
observing,  for  example,  the  temperature  versus  time  trend  on 
a  display.  The  interpretant  is  itself  a  sign — perhaps  a 
command  to  the  control  system  to  initiate  a  retuning  of  the 
PID  controller.  There  are  many  other  signs  of  the  considered 
object:  large  overshoot  in  step  responses,  high  control 
action,  poor  quality  product  downstream,  etc.  There  are  also 
many  potential  interpretants,  which  could  be  generated  by 
manual  or  automatic  systems. 


Large  Temperature 
Variation(Sign) 


Poorly  Controll  ed   Effect  onOperator 

Process  (Object)  (Interpretant) 

Fig.  1.  A  semiotic  triangle  for  a  simple  control 
application. 


The  chain  of  semiosis — a  sign  resulting  in  an  interpretant 
which,  being  a  sign  itself,  leads  to  another  interpretant,  and  so 
on — exists  in  control  systems  no  less  than  in  general 
semiotic  systems.  At  some  point,  the  chain  terminates  and  a 
physical  action  is  taken — the  retuned  controller  starts 
operating  a  valve  differently  than  before.  At  this  limit  of  the 
semiotic  activity,  its  triadicity  is  replaced  by  the  dyadic  action 
of  the  physical  world.  Since  a  control  system  operates  by 
feedback,  the  semiotic  regression  ultimately  affects  its  object, 


'For  those  readers  unfamiliar  with  PID  controllers,  the  control 
output  u  is  computed  as  a  weighted  sum  of  three  error  terms: 
the  instantaneous  error  e(t)  (i.e.,  the  difference  between  the 
current  system  output  value  and  the  desired  value),  the 
integrated  sum  of  this  error,  and  the  rate  of  change  of  this 
error.  Kc,  AT,-,  Kj  are  the  respective  weights  and  are  generally 
referred  to  as  the  proportional,  integral,  and  derivative  gains. 


the  system  under  control,  thereby  initiating  another  chain  of 
interpretants. 

There  is  as  yet  no  consensus  on  a  typology  of  signs.  Eco 
(1984)  analyses  various  historical  treatments  of  the  concept  of 
the  sign,  stressing  its  generality.  Peirce's  initial  classification 
differentiated  between  indices  (signs  in  physical  adjacency  to 
their  objects),  icons  (signs  similar  to  their  objects),  and 
symbols  (in  which  the  sign/object  connection  is  arbitrary). 
Eco  (1976)  labels  this  classification  "untenable"  and  suggests 
that  signs  can  be  classified  from  different  perspectives  (e.g., 
relative  to  the  channels  of  communication,  the  gradation  of 
the  sign,  and  its  origin).  Sebeok  (1994)  extends  Peirce's 
tripartite  scheme  with  three  additional  types:  signals, 
symptoms,  and  names. 

What  types  of  signs  are  appropriate  for  control  systems?  It  is 
useful  to  adopt  Eco's  distinction  between  a  signal  and  a  sign 
(Eco,  1976;  p.  46)2.  A  signal,  by  his  definition,  is  "a 
pertinent  unit  of  a  system  that  .  .  .  could  also  be  a  physical 
system  without  any  semiotic  purpose.  ...  A  signal  can  be  a 
stimulus  that  does  not  mean  anything  but  causes  or  elicits 
something."  In  our  earlier  example,  we  described  the 
operation  of  a  PID  algorithm  as  lacking  a  semiotic  character. 
We  can  thus  say  that,  when  the  process  output  error  signal 
serves  only  as  input  to  a  PID  controller,  it  is  solely  a  signal. 
It  assumes  a  semiotic  role,  and  the  status  of  a  sign,  only 
when  a  semiotic  triangle  is  configured  through  the  operation 
of  a  (human  or  machine)  interpreter  that  is  capable  of 
generating  an  interpretant  for  further  signification. 

Some  signs  relevant  to  intelligent  control  systems  are  shown 
in  Fig.  2,  which  also  differentiates  between  mandatory  and 
optional  signs.  In  the  general  case,  a  control  system  will 
receive  inputs,  and  these  may  have  signification  potential, 
from  its  environment,  its  human  users,  the  system  to  be 
controlled,  and  other  automated  systems.  Its  outputs,  again 
potentially  signs,  will  be  communicated  to  human  users 
(some  of  these  may  be  identical  to  the  human  users  that 
provide  input  to  the  control  system,  but  others  will  not  be), 
the  controlled  system,  and  other  systems.  Semiosis  is  not 
limited  to  the  interface  between  the  control  system  and  the 
world;  signification  occurs  internally  as  well.  I  am  assuming 
in  this  figure  that  any  device  that  can  be  directly  effected  by 
the  control  system  is  by  definition  part  of  the  controlled 
system.  The  environment  is  therefore  not  an  output. 

This  figure  is  not  intended  to  represent  a  far-off,  futuristic 
vision  in  which  intelligent  control  systems  may  operate  with 
complete,  unmitigated  autonomy.  Semiotics  can  play  a 
revolutionary  role  in  control  systems  in  the  considerably 
nearer-term.  Some  role  for  the  human  will  undoubtedly  exist 
in  this  case — at  a  minimum,  in  the  start-up,  shut-down,  and 
maintenance  of  the  control  system.  For  a  practical  intelligent 
control  system,  the  human  interaction  is  not  optional. 


2"Signal"  is  also  one  of  the  six  types  of  signs  classified  by 
Sebeok  (1994).  In  this  paper,  I  follow  Eco's  usage  since  it 
conforms  to  the  common  notion  of  signal  in  control  systems. 
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3.  Sign  Functions  in  Control 
Systems 

Next,  I  briefly  discuss  the  signification  potential  for  some  of 
the  interactions  shown  in  Fig.  2.  None  of  these  inputs  and 
outputs  are  completely  absent  in  today's  control  systems. 
However,  I  hope  to  show  how  considering  the 
communications  depicted  in  Fig.  2  as  significations  in  a 
general  sense  naturally  results  in  a  broadening  of  what  we 
mean  by  a  control  system,  with  the  potential  for  dramatic 
advances  in  control  technology. 

3.1.  Signs  referring  to  states  of  the  controlled 
system 

Most  industrial  control  systems  are  configured  to  expect 
sensor  readings  to  consist  of  raw  numeric  information: 
temperatures,  pressures,  flows,  forces,  etc.  Sensor  readings 
are  only  intended  to  be  signals.  There  is  now  increasing 
interest  in  the  use  of  unconventional  modalities  in  feedback 
systems,  extending  the  notion  of  a  sensor  in  novel  ways. 
Control  engineers  are  actively  exploring  the  integration 
within  control  systems  of  "artificial  noses"  for  fugitive 
emissions  detection,  pattern  recognition  devices  for  visual 
quality  control,  vibration  signature  analyzers  for  rotating 


machinery,  etc.  However,  this  exploration  is  proceeding  in  an 
ad-hoc,  piecemeal  manner  that  does  not  recognize  the 
fundamental  signification  function  that  these  innovations  have 
in  common.  In  most  of  these  cases,  the  control  system  may 
provide  the  capability  for  a  human  to  view  the  sensor  output 
and  for  a  human  or  a  separate  stand-alone  system  to  process 
the  sensory  information,  but  this  limits  the  role  of  the  control 
system  to  a  transmission  medium.  The  video  image  or 
vibration  spectrum  may  be  a  sign  for  the  human,  but  it  is  not 
a  sign  for  today's  control  system. 

The  growing  use  of  neural  networks  and  related  technologies 
is  also  providing  a  limited  semiotic  capability  to  control 
systems,  but  much  more  can  be  done.  A  sensor  complex  that 
detects  the  number  and  distribution  of  people  in  a  conference 
room  could,  in  principle,  be  used  for  more  effective  and 
efficient  temperature,  humidity,  and  air  quality  control.  In 
current  systems,  realizing  these  improvements  would 
necessitate  one-of-a-kind  encoding  strategies.  In  contrast  to 
sensor  signals,  where  (for  example)  the  4-10  mA  standard  is 
universally  accepted,  there  is  no  standard  for  representing 
complex  information  regarding  system  state. 

These  shortcomings  can  be  attributed,  not  entirely  but  in  large 
part,  to  a  failure  to  recognize  that  control  systems  can  be 
semiotic  entities. 


Th  e  Env  i  ronment 


Human  Users 


The  Controlled  System 


Other  Systems 


Human  Users 


The  Controlled  System 


Other  Systems 


Fig.  2.  The  roles  of  signs  as  input  to,  output  from,  and  internal  to,  an  intelligent  control  system.  The  lighter  arrows  imply 

optional  significations. 


3.2.  Signs  from  human  users 

Another  critical  aspect  of  communication  to  a  control  system 
is  user  commands.  Current  systems  adopt  unstructured,  flat 
codes,  a  limitation  that  is  not  problematic  for  the  tasks 
demanded  of  them,  such  as  regulating  the  temperature  of  a 
room  to  a  fixed  setpoint.  But  this  limitation  is  incompatible 
with  revolutionary  advances  in  control.  In  the  foreseeable 
future,  we  would  like  to  have  automatic  control  systems  that 
can  respond  to  commands  such  as  (for  a  building  cooling 
system):  "Attempt  to  reduce  energy  consumption  by  10% 
without  permitting  temperatures  to  rise  more  than  1°C  from 
nominal  settings  in  any  zone  and  without  resorting  to  full 
lighting  setback  except  in  unoccupied  areas."  The  expression 
need  not  be  in  a  natural  language,  although  that  is  a  worthy 
goal,  but  in  either  case  the  intelligence  required  to  effect  such 


commands  requires  a  semiotic  sophistication  well  beyond 
what  current  control  technology  offers,  or  even  seeks  to  offer. 

Other  types  of  signification  from  users  can  also  be  catalogued: 
commands  that  relate  to  controller  operation  (such  as 
switching  from  manual  to  automatic  control),  the 
communication  of  prior  knowledge  about  the  controlled 
system  (such  as  model  order  or  parameter  estimates),  the 
communication  of  application-specific  heuristics  for  control 
(such  as  fuzzy  rules),  etc. 

3.3.  Signs  referring  to  controller  commands 
The  common  valve  today  is  restricted  in  its  control  input  to  a 
stem  position  command,  which  it  receives  every  sample  and 
executes  with  a  small  actuational  delay.  Consider,  however,  a 
valve  that  can  take  as  input  a  schedule  of  position  or  position 
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change  commands.  Such  a  device  would  enable  more  efficient 
utilization  of  control  system  resources.  Valve  technology  is 
not  a  stumbling  block:  smart  valves  with  self-diagnostic  and 
self-calibration  features  are  becoming  available  (adequately 
dealing  with  these  also  requires  a  sign-function  capability).  A 
primary  obstacle  is  the  highly  restricted  code  used  to 
communicate  the  controller  output  to  the  valve. 

Even  today,  control  systems  for  industrial  applications  are 
hierarchically  structured  (a  multivariable  controller  will 
typically  generate  setpoints  for  PID  controllers).  Thus  it  is 
already  commonplace  for  the  output  of  a  control  system  to  be 
an  input  to  another  control  system.  The  remarks  above 
regarding  the  signification  potential  for  controller  inputs  (in 
particular,  signs  that  refer  to  users'  expectations  of  the 
system)  suggest  that  controller  outputs  should  be  viewed  in 
the  same  light. 

3.4.  Signs  referring  to  controller  structures 
This  category  hides  considerable  complexity.  For  those  of  us 
involved  in  the  design  and  analysis  of  intelligent  control 
systems,  as  distinct  from  their  black  box  operation,  it  is  also 
the  most  interesting  one. 

The  key  point  here  is  that  some  sort  of  modularity  is 
inevitable  in  a  control  system  of  any  sophistication. 
Controllers  for  large-scale  systems  will  comprise  ystem 
models  (often  compositional  themselves),  optimizers, 
parameter  estimators,  state  estimators,  constraint  handlers, 
diagnostic  modules,  schedulers,  planners,  performance 
monitors,  adaptation  blocks,  and  many  other  separable  but 
synergistic.  Modularity  does  not  imply  a  commitment  to  any 
specific  structure,  such  as  a  strict  hierarchy.  On  the  contrary, 
more  amorphous  compositions  may  well  be  preferable.  One 
general  way  to  view  an  intelligent  control  system  is  as  an 
agent  architecture,  with  the  agents  representing  different 
(although  possibly  overlapping)  functionalities. 
Communication  between  agents  can  occur  through  shared 
"blackboards"  or  by  message-passing.  This  communication 
must  embody  a  semantic  richness  that  is,  in  all  likelihood, 
incompatible  with  representing  the  input/output  parameters  of 
the  modules  or  agents  as  unstructured  numerical  variables. 

4.  Elements  of  a  Semiotic  Approach 
to  intelligent  control 

Semiotics  is  not  a  technology  in  the  sense  that  it  can  provide 
algorithms  or  devices  ready  for  implementation  in  intelligent 
systems.  However,  as  a  discipline  that  focuses  on 
signification  and  communication  in  general,  it  can  be 
considered  to  subsume  aspects  of  several  technologies.  In  this 
section,  I  briefly  discuss  some  of  these  connections,  and  their 
relevance  to  endowing  intelligent  control  systems  with 
semiotic  capabilities. 

4.1.  Multiresolution  models 

Intelligent  controllers  need  to  handle  target  systems  of  all 
scales,  from  single-loop  to  system-wide.  Representations  that 
are  appropriate  for  this  breadth  of  scope  are  needed.  A  naive 


view  may  presume  that  one  large  "flat"  model  will  serve  the 
purpose,  but  this  strategy  is  hardly  efficient,  or  even  feasible 
for  large-scale  systems. 

Current  control  schemes  already  exhibit  some 
compositionality,  with  a  hierarchy  of  implicit  or  explicit 
models: 

•  At  the  single-loop  level,  PID  or  (occasionally)  more 
sophisticated  controllers  are  implemented  with  heuristic 
knowledge  of  elementary  process  dynamics  and  little 
consideration  of  variable  couplings. 

•  Multivariable  controllers  explicitly  take  into  account  the 
interactions  between  different  components  of  a  small  or 
moderate  scale  system.  In  most  cases,  multivariable 
controllers  provide  setpoints  for  PIDs  rather  than  directly 
commanding  actuators. 

•  There  is  now  increasing  interest  in  higher  levels  of 
control — unit-wide  or  plant-wide  optimization  in  industrial 
processes,  for  example.  At  these  levels,  unlike  the 
preceding  two,  models  often  attempt  to  capture 
nonlinearities,  but  sacrifice  detailed  knowledge  of 
dynamics. 

Conventional  control  schemes  make  little  or  no  attempt  to 
integrate  these  various  control  and  modeling  scales.  Ensuring 
consistency  among  models  of  different  scales  is  the  task  of  the 
human  control  engineer. 

Multiple  scales  of  representation  are  also  required  on  the 
temporal  level.  In  most  large-scale  systems,  both  external 
disturbances  and  intrinsic  dynamics  will  occur  over  a  wide 
range.  Energy  consumption  in  commercial  buildings,  for 
example,  will  show  variations  that  are  correlated  with  a) 
day/evening,  b)  weekday/weekend,  and  c)  seasonal  cycles.  In 
setting  interior  lighting  levels  and  lighting  setbacks  on  a  daily 
basis,  knowledge  of  the  (seasonal)  daylight  hours  can  be  used 
to  optimize  energy  efficiency. 

Wavelets  are  an  obvious  candidate  for  a  medium  that  can 
represent  a  wide  range  of  temporal  and  spatial  phenomena.  A 
few  intriguing  research  results  at  incorporating  wavelets  in 
control  schemes  have  been  reported,  and  more  work  in  this 
direction  is  needed.  One  limitation  of  current  work  in  this 
area  is  also  notable:  it  assumes  that  wavelets  are  inherently  a 
univariate  representation  scheme,  well  suited  for  capturing 
variations  at  multiple  time  scales  of  a  single  signal  (such  as 
building  energy  consumption).  Their  value  for  representing 
multi-scale  multivariable  input-output  models  is  still  in 
question. 

4.2.  Fuzzy  logic 

A  distinguishing  feature  of  intelligent  control  systems,  as 
distinct  from  many  other  intelligent  systems,  is  the 
requirement  that  they  function  both  in  the  symbolic  and 
numerical  domains.  Fuzzy  logic  is  useful  in  intelligent 
control  systems  in  large  part  measure  because  it  fulfills  some 
part  of  this  requirement.  Through  the  use  of  fuzzy 
membership  functions,  raw  numeric  data  can  be  translated  into 
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linguistic  labels.  Furthermore,  fuzzy  rules  allow  heuristic 
(and  symbolic/linguistic)  control  information  gained  from 
human  experts  to  be  used  for  continuous  control  applications. 

These  features  are  often  noted  from  the  standpoint  of  system- 
human  communication,  but  they  are  also  important  for  inter- 
module (or  inter-agent)  communication  in  intelligent  control 
systems.  Reasoning  processes  in  these  systems  may  be 
symbolic,  but,  for  most  control  applications,  their  outcomes 
will  need  to  be  translated  into  continuous  quantities. 

The  relevance  of  fuzzy  logic  to  intelligent  control  is 
unquestioned,  but  the  current  state  and  directions  of  this 
technology  have  some  significant  limitations.  In  particular, 
popular  fuzzy  rule  formalisms  are  inconvenient  to  use  for 
more  than  small-scale  applications.  A  rule  in  a  fuzzy 
controller  maps  conjunctions  of  sensor  readings  to  a  controller 
output  through  a  simple  conditional  relationship.  More 
complex  rules  are  rarely  considered. 

4.3.  Self-organization  and  learning 

The  alternative  to  some  ability  for  self-organization/ 
learning/adaptation  in  a  control  system  is  to  have  every  facet 
of  the  control  scheme  manually  specified.  Whether  or  not 
this  is  seen  as  a  feasible  option  depends  largely  on  what  we 
mean  by  learning,  or  conversely  what  we  mean  by  manual 
specification.  Current  usage  betrays  some  inconsistencies. 
For  example,  training  a  neural  network  (even  off-line)  is 
typically  classified  as  learning,  whereas  linear  system 
identification  is  not. 

It  is  neither  accurate  nor  helpful  for  promoting  intelligent 
control  to  view  learning  and  adaptation  as  capabilities  that 
control  systems  either  do  or  do  not  exhibit.  Rather,  there  is  a 
spectrum  of  possibilities.  Towards  the  simpler  end  of  this 
spectrum,  the  widespread  use  of  system  identification,  gain 
scheduling,  and  even  feedback  linearization  demonstrates  the 
feasibility  of  learning  and  adaptation  in  some  form.  The  issue 
then  is  not  whether  to  include  learning  capabilities  in  future 
control  systems,  but  how  to  increase  their  learning 
capabilities  in  order  to  reduce  the  level  of  manual  effort 
currently  involved  in  control  design,  system  modeling,  and 
controller  modifications  in  the  face  of  unexpected  variations. 
It  is  inconceivable  that  human  involvement  can  be  completely 
dispensed  with,  just  as  it  is  inconceivable  that  our  current 
control  technology  would  have  progressed  to  the  point  it  has 
without  some  element  of  learning. 

In  intelligent  systems  research,  the  interest  in  learning  has 
been  predominantly  geared  toward  improving  the  performance 
of  individual  modules  in  the  system.  Learning  in  a  semiotic 
system  has  another  important  implication:  the  generation  of 
new  communication  and  signification  capabilities.  Thus 
languages  for  inter-agent  communication  can  be  evolved  to 
allow  the  communication  of  appropriate  information  in  a 
sufficiently  expressive  yet  easily  interpreted  form. 

4.4.  Computational  linguistics 

As  we  broaden  our  view  of  control,  a  fundamental  insight  to 
contemplate  is  that  human  users  of  control  systems  need  to  be 


part  of  the  picture.  The  importance  of  human-system 
interaction  increases  in  proportion  to  the  functionality  of  the 
control  system.  Further,  different  classes  of  users  must  be 
considered,  including  operators,  control  engineers, 
maintenance  supervisors,  and  others. 

Effective  human/control-system  interaction  requires 
appropriate  communication  mechanisms.  The  control 
community's  view  of  user  communication  needs  has  generally 
been  limited  to  graphical  and  tabular  displays  of  numeric  data. 
These  are  sufficient  for  most  low-level  control  functions,  but 
not  for  higher  levels.  Commands  from  users  at  these  levels, 
for  example,  will  not  be  setpoint  changes  but  more  complex 
instructions  which  may  be  most  naturally  expressed  in  some 
limited  linguistic  form  (as  illustrated  in  Section  3.2). 

Current  research  in  theoretical  linguistics  is  largely  in  the 
Chomskyan  generative  tradition.  The  hard  distinctions  drawn 
in  this  research,  such  as  competence  versus  performance  and 
syntax  versus  semantics,  with  the  subject  of  linguistic  study 
limited  to  syntactic  competence,  limits  its  direct  relevance  to 
engineering  systems.  The  discipline  of  computational 
linguistics,  however,  addresses  all  aspects  of  linguistic  signs 
and  signification — syntactic,  semantic,  and  pragmatic. 
Computational  linguistics  exploits  insights  from  modem 
linguistics,  along  with  developments  in  less  analytic 
approaches  (e.g.,  statistical  and  connectionist  natural  language 
processing),  and  shows  considerable  promise  for  the 
development  of  effective  language  understanding  and 
generation  systems  for  focused  applications. 

The  immediate  relevance  of  computational  linguistics  to 
control  systems  is  in  the  pursuit  of  more  effective  and 
efficient  human-system  interaction.  However,  there  is  another 
intriguing  connection.  As  noted  earlier,  an  intelligent  control 
system  will  not  be  an  undifferentiated  block  of  software,  but 
will  be  composed  of  numerous  subsystems  many  of  which 
will  exhibit  some  aspects  of  intelligent  behavior.  Natural 
language  is  the  preferred  means  of  human  communication 
because  it  has  evolved  to  satisfy  the  communication  needs 
between  individual  intelligent  beings.  As  control  systems 
increase  in  sophistication,  functionality,  complexity  (and 
intelligence),  their  component  modules  will  too.  Parameter 
or  signal  vectors  will  not  provide  an  adequate  communication 
mechanism  between  these  modules.  A  full-fledged  natural 
language  capability  is  neither  required  nor  desirable,  but  the 
discipline  of  computational  linguistics  has  much  to  offer 
towards  devising  a  more  limited  language,  one  that  is 
qualitatively  richer  than  current  mechanisms. 

4.5.  Classical  control 

Neither  humans  nor  intelligent  controllers  are  directly 
responsible  for  the  control  of  today's  aircraft,  manufacturing 
plants,  automobiles,  or  any  of  the  other  domains  that 
represent  the  true  successes  of  control.  The  credit,  in  large 
part,  goes  to  our  classical  control  technology.  Intelligent 
control  must  build  on  this  proven  foundation,  and  it  must 
provide  the  framework  for  classical  control  techniques  to 
continue  to  serve  their  many  useful  purposes.  As  we  discuss 
the  need  for  semiotically-sophisticated  control,  and  for  new 
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codes  and  signification  capabilities,  it  is  important  to 
remember  that  such  semiotic  capabilities  as  are  incorporated 
in  current  control  systems  must  be  maintained.  Even  as  we 
suggest  the  need  for  linguistic  communication,  for  example, 
at  some  level  of  an  intelligent  controller  there  will  still  be  a 
need  to  communicate  linear  transfer  function  matrices. 
Analogously,  the  imprecision  of  fuzzy  logic  can  supplement, 
but  it  cannot  supplant,  the  expression  of  model  uncertainty  in 
the  jj.  framework. 

6.  Concluding  Remarks 

A  matter  of  definition  lurks  behind  the  scene  in  this  paper:  is 
the  broader  vision  of  control  sketched  above  overstepping  the 
boundaries  of  the  discipline?  In  fact,  the  vision  of  control  has 
always  been  a  broad,  overarching  one.  Any  introductory 
control  textbook  will  first  present  some  examples  of  target 
systems  for  control  technology,  and  these  can  encompass 
national  economies,  global  ecologies,  and  similarly  large- 
scale,  complex  domains  (see  Fig.  3).  Unfortunately,  the  logic 
of  control  often  belies  its  rhetoric.  A  semiotic  perspective  is 
fully  compatible  with  the  classical  vision  of  control,  even  if 
the  practice  of  control  science  and  engineering  shows  little 
evidence  of  this  connection. 


Although  in  this  paper  I  have  focused  on  the  role  of  semiotics 
in  intelligent  control  systems,  I  have  attempted  neither  to 
define  intelligent  control  nor  to  separate  it  from  any  other 
adjectival  variation — definitions  of  intelligence  in  any  context 
are  of  problematic  value.  I  propose  not  a  definition  but  a 
guiding  principle:  the  degree  of  intelligence  in  a  control 
system  is  in  direct  proportion  to  the  sophistication  of  its 
semiotic  character. 
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Fig.  3.  Visions  of  Control.  From 
Richard  C.  Dorf,  Modern  Control 
Systems,  Addison-Wesley,  1974.  As 
adapted  from  The  Limits  to  Growth: 
A  Report  for  the  Club  of  Rome's 
Project  on  the  Predicament  of 
Mankind,  2nd  ed.,  D.H.  Meadows  et 
al.  New  York:  Universe  Books, 
1972.  Used  with  permission  of 
Potomac  Associates,  Washington, 
D.C. 
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ABSTRACT 

Direct  consequence  of  systems  complexity  increase  is  more 
complicated,  less  observable  and  less  controllable  trajec- 
tory in  phase  space  in  a  classical  sense.  Control  ade- 
quacy should  imply  more  freedom  in  decision  process  to 
balance  system  modeling  unpreciseness  and  inadapted  in- 
formation flux  handling  when  too  much  centralized.  More 
global  approach  than  classical  state  space  is  now  required 
to  fit  reality  of  larger  trajectory  wandering  and  their  na- 
tural organization  along  invariants  in  phase  space  inside 
which  they  become  undistinguishable.  Most  elementary 
is  function  space  approach,  where  the  "unit"  is  a  whole 
trajectory,  and  control  problem  becomes  embedding  sy- 
stem solution  into  initially  prescribed  function  space  from 
asymptotic  stability  and  monotonicity  conditions.  This 
"functional"  control  strongly  reduces  information  fluxes 
up  and  down  inside  system  structure  by  taking  advantage 
of  internal  system  selforganization  consecutive  to  comple- 
xification.  Hierarchical  structure  is  here  satisfied  within 
subsidiarity  principle,  by  delegation  of  more  "intelligence" 
toward  lower  levels  and  by  leaving  more  freedom  for  de- 
cisional processes  at  higher  levels  where  fuzzy  controllers 
are  mostly  adapted,  opening  the  way  to  intelligent  self- 
deciding  systems. 

KEYWORDS:  Complex  System,  Intelligent  Control, 
Stability,  Functional  Control 

I.  INTRODUCTION 

To  accomplish  more  complex  technical  tasks  with  more 
complex  systems[l]  as  required  in  modern  industries  to 
survive  in  world  spreadout  economy,  one  might  rely  upon 
technology  advance  of  the  various  system  components  and 
augment  in  parallel  control  complexity  with  larger  sensor 
information.  Independent  of  failure  increase  with  all  its 
consequences,  this  approach  is  different  from  existing  li- 
ving systems,  exhibiting  remarkable  ability  to  decide  and 
to  perform  extremely  complex  tasks  with  their  complex 
"living"  machine.  This  suggests  that  another  less  strictly 
computerized  and  more  qualitative  way  is  possible,  and 
some  elements  in  this  direction  are  given  in  the  following. 
For  Lagrangian  dynamical  systems,  classical  PD-type  con- 


trollers locally  give  system  trajectories  a  stable  fo-  cus- 
type  structure  guaranteeing  trajectory  asymptotic  stabi- 
lity. For  more  complicated  systems,  concepts  of  "adap- 
tiveness"  [2]  and  of  "robustness" [3]  have  been  analyzed. 
They  can  be  represented  with  symbolic  "co-  ordinates" 
(system  complexity-performance  level)  as  arrows  along  a 
xisses  from  initial  simple  PID  control  domain,  see  Fig.l. 
Adaptiveness  is  asymptotically  stable  by  construction,  but 
at  some  level  system  complexity  and  performance  analy- 
sis cannot  be  handled  together  in  real  time,  leading  to  a 
tradeoff.  In  robustness  approach,  simple  stability  resul- 
ting from  system  simplification  forbids  higher  preciseness 
required  for  more  advance  performance  quality,  so  that 
both  extensions  are  mainly  localized  along  the  two  axis- 
ses.  On  the  other  hand,  as  elementary  performance  is  not 
worth  from  a  complex  system,  nor  high  level  performance 
is  usually  output  from  a  simple  system,  real  system  repre- 
sentative point  should  stay  inside  a  band  along  the  first 
diagonal. 

This  means  that  both  previous  approaches  are  suffering 
from  the  defect  of  not  changing  system  structure  which 
cannot  remain  too  centralized  beyond  some  level  of  com- 
plexity. Improvement  of  information  flux  circulation  re- 
quires more  delegation  of  decisional  capability  to  system 
parts  themselves  with  system  complexity  increase,  ie  to 
subsidiase  more  intelligence  into  local  controls  which  be- 
come of  reflex  type,  so  that  upward  information  flux  is 
strongly  reduced  for  higher  level  controllers  to  deal  with 
affordable  amount  of  material.  Here  Fuzzy  controllers[4] 
are  singled  out  as  they  would  provide  very  adequate  rela- 
tions between  quantitative  description  at  system  dynamics 
level  and  qualitative  representation  at  decision  intelligent 
higher  level.  So  as  they  split  off  the  diagonal  band,  previ- 
ous control  approaches  are  not  adapted  for  extension  to  in- 
telligent higher  controls  due  to  the  gap  between  their  (too 
much  detailed)  representative  variables  and  the  (more  glo- 
bal) ones  adapted  to  intelligent  controls,  leading  to  unaf- 
fordable  explosion  of  logical  rules  with  system  complexity 
from  combinatorics. 

Convergence  toward  the  diagonal  band  is  nevertheless  pos- 
sible, as  living  systems  are  showing  it  everyday  by  chan- 
ging the  nature  of  control  from  trajectory  space  to  task 
space,  which  is  made  possible  by  the  huge  experience  coll- 
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ected  by  each  individual.  For  complex  mechanical  or  tech- 
nical systems,  it  is  also  possible  to  converge  toward  the 
diagonal  band  with  a  new  type  of  control,  called  functional 
control[5],  based  on  intermediate  "natural"  variables  filling 
this  gap,  and  with  which  a  multilevel  control  allowing  inte- 
gration of  decisional  functions  into  system  control  can  be 
constructed.  The  main  idea  is  that  when  system  comple- 
xity increases,  its  trajectory  in  phase  space  becomes  more 
complicated,  but  at  the  same  time  the  information  con- 
tent in  its  observation  decreases,  because  trajectories  are 
becoming  less  distinguishable,  ie  are  becoming  equivalent 
in  a  dynamic  sense.  They  are  filling  more  and  more  regu- 
larly domains  in  phase  space,  and  initial  conditions  (local 
invariants)  cannot  be  used  as  before  to  define  a  trajectory. 
Only  more  global  invariants[6]  can  be  used  to  classify  tra- 
jectories in  groups  within  which  single  trajectories  cannot 
be  distinguished  anymore.  Examples  are  for  instance  ma- 
gnetic field  lines  which  are  not  distinguishable  on  their  flux 
surfaces,  and  extreme  case  is  a  gas  in  thermodynamics  the 
trajectories  of  which  on  energy  surface  are  all  equivallent. 
Another  quite  frequent  situation  occurs  when,  due  to  non- 
linearities,  there  exists  bifurcation  points  at  which  system 
dynamics  branch  off  toward  more  complicated  trajectories 
up  to  chaotic  ones[7].  In  all  cases,  invariants  are  no  longer 
initial  trajectory  coordinates  as  for  classical  mechaical  pro- 
blem, but  the  corresponding  invariants,  for  instance  flux 
surface  for  magnetic  field  and  energy  surface  for  gas.  Here 
part  of  trajectory  behavior  escapes  from  direct  control  in 
the  sense  that  because  the  number  of  DOF  of  state  vector 
is  much  larger  than  control  vector,  system  dynamics  orga- 
nize themselves  on  invariant  surfaces.  It  is  interesting  to 
note  that  in  this  case,  state  vector  may  still  be  observed, 
but  all  its  components  are  not  controllable.  In  consequence 
control  problem  transforms  into  acting  more  globally  onto 
system  trajectory  as  a  whole  rather  than  correcting  it  lo- 
cally, and  to  impose  the  solution  of  system  dynamics  to 
belong  to  a  prescribed  function  space,  chosen  from  general 
desired  properties  in  their  functional  behavior  (overshoot, 
decay,  monotonicity,..),  and  compatibility  with  system  dy- 
namical behaviors  which  should  belong  to  this  function 
space.  This  is  the  only  way  to  take  advantage  of  system 
selforganization  resulting  from  its  complexification,  and  to 
eliminate  risk  of  antagonizing  this  selforganization  by  ina- 
dequate control.  The  resulting  "embedding"  problem  fits 
into  Fixed  Point  Theorem[8]  framework,  and  asymptotic 
stability  toward  desired  trajectory  may  be  guaranteed. 
To  technically  deal  with  system  dynamics  complication 
and  difficulty  of  correct  physical  phenomena  representa- 
tion, an  efficient  way  would  be  to  combine  both  advan- 
tages of  functional  and  robust  controls,  ie  asymptotic  sta- 
bility and  simplicity.  From  analysis  of  regular  robustness 
approach,  simple  stability  is  due  to  narrowness  of  allo- 
wed function  space  when  applying  Lyapounov  Theorem, 
and  forbids  application  of  Fixed  Point  Theorem,  whereas 


adequate  function  space  enlargement  leads  to  asympto- 
tic stability  property  without  requiring  extra  system  in- 
formation. In  next  part  reduction  of  dynamical  system 
equations  to  canonical  form  is  given.  Structure  of  classical 
robust  control  is  analyzed  afterward  with  its  inherent  de- 
fects, and  improved  robust  control  is  explicitely  construc- 
ted. Resulting  asymptotic  robust  control  is  of  simple  form 
and  is  easily  implementable.  In  contrast,  fuzzy  control  law, 
even  when  analytically  explicited  in  terms  of  system  va- 
riables to  save  basic  combinatorics  computation,  requires 
their  approximate  evaluation,  making  the  approach  unu- 
sable at  this  detailed  level  for  complex  large  dimension 
system.  Application  to  N-link  actuated  compliant  and  de- 
formable  mechanical  system  is  given  as  an  example. 

II.  CANONICAL  ERROR  EQUATIONS 

Let  for  t  >  to  the  dynamical  system 

with  state  vector  X  =  col(X\ ,  X2,  --Xn  and  control  vec- 
tor U .  F (.,.,.)  is  supposed  smooth  enough  to  appropriate 
order  in  its  arguments  so  that  a  solution  X  =  X(t,Xo) 
exists  for  a  fixed  U  =  Uj(t)  with  initial  condition  Xq  = 
X(to),  and  with  possible  different  behaviors  for  t  — ►  00 
for  different  Xq.  For  a  fixed  X  =  Xd(t),  there  conver- 
sely exists  for  U  one  (or  more)  solution  of  eqn(l)  Ud  = 
F~1(Xd,dXd/dt,t),  so  in  principle  U  may  be  determined 
for  eqn(l)  to  have  a  solution  behaving  in  a  prescribed  way. 
System  trajectories  close  to  Xd{t)  have  to  be  analyzed  to 
determine  the  conditions  for  their  convergence  to  Xd{t) 
for  large  t.  To  deal  with  in  not  well  known  functional  de- 
pendence of  F(.,t, .)  in  real  cases,  much  simpler  structural 
stability  analysis  is  called  for.  To  proceed,  the  RHS  of 
eqn(l)  is  splitted  into  a  "simple"  part  for  which  a  control 
exists  guaranteeing  asymptotic  convergence,  and  a  "rest" 
with  upper  bounded  effect.  An  equivalence  class  with  re- 
spect to  this  "  rest  effect"  is  realized  if  an  additional  control 
exists  giving  also  the  complete  system  asymptotic  conver- 
gence property,  thus  reducing  the  analysis  to  "skeleton" 
systems  out  of  which  these  equivalence  classes  are  con- 
structed. This  will  be  called  a  canonical  reduction. 
With  X  =  Xd  +  x,  U  =  Ud  +  u,  variational  error  system 
from  eqn(l)  is 

dx 

—  =  A0.x  +  B.{u0  +  Att) +  77  (2) 
at 

splitting  variational  control  u  into  uo  guaranteeing  as-  ym- 
ptotic  convergence  of  "skeleton"  equation  with  77  =  0,  and 
additional  part  Au  to  annihilate  the  action  of  extra  term 
77  containing  both  nonlinear  and  error  terms  when  pas- 
sing from  eqn(l)  to  eqn(2).  Eqn(2)  is  useful  if  there  exists 
for  skeleton  equation  a  P-type,  possibly  time  dependent, 
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control  such  that,  with  uq  =  —Kx,  there  exist  positive  de- 
finite matrices  (P,  Q)  satisfying  (Ag  +  KTBT)P  +  P(A0  + 
BK)  +  dPfdt  =  —Q  and  Lyapounov  stability  theorem  ap- 
plies with  V  =  xTPx,  dV/dt  =  —xTQx.  Au  should  now 
be  found  such  that  with  extra  term  77,  asymptotic  conver- 
gence property  is  maintained  for  eqn(2). 

III.  IMPROVED  ROBUST  CONTROL 
ANALYSIS 

Derivative  of  V  =  xT Px  along  trajectories  of  eqn(2)  writes 

^  =  -xTQx  +  2{BT  Pxf Au  +  2(Px)Tr,  (3) 
dt 

With  the  bound  \t)\  <  p  >  0,  not  specified  for  the  moment, 
usual  robust  method  is  defined  by  additional  control[9] 


AuR=  -p(BTB)-1BTPx 


1 


[Sup(e,||P*| 
where  e  is  small  >  0,  leading  to  the  bound 


(4) 


(5) 


with  y  -  =  Suptp{y,t),  and  Ami„  =  Minj<tXj 

(<).  For  bounded  pl  with  pl(0)  ^  0,  there  exists  y  =  ym 
below  which  dV/dt  becomes  positive,  defining  an  attractor 
inside  the  ball  xm  =  P~1ym[f]-  All  trajectories  will  hit  its 
surface  after  a  finite  time,  and  remain  captured  inside  for 
larger  time.  Its  size  is  small  for  e  small  and  Am:n(<5)  large, 
ie  for  large  gains.  So  with  usual  robustness  approach, 
initial  system  uncertainty  due  to  nonzero  norm  bounded 
terms  77  cannot  be  completely  removed  and  leads  to  state 
uncertainty  inside  a  finite  ball  ||£m||  on  system  error  dyna- 
mics, giving  simple  stability  result  after  a  first  exponential 
decay  period.  Due  to  conservativeness  of  sufficient  stabi- 
lity condition  in  Lyapounov  method  this  decay  cannot  be 
satisfied  for  all  time.  Also,  for  (AUra  =  0)  and  77  ^  0, 
one  gets  dV/dt  <  -xTQx  +  2p\\Px\\  <  -XV  +  2kpLV^2 
with  A  and  k  from  bounds  on  eigenvalues  of  P  and  Q.  V  is 

upper  bounded  by  t  =  /yi/2  [kpL(x2)  —  (X/2)x]~1dx  with 
exponential  behavior  Aa;  ~  exp(2kxmp'Lm  —  \/2)t  close 
to  a  zero  xm  of  integrand  denominator,  stable  or  unstable 
depending  on  the  sign  of  coefficient  oft.  Final  behavior  of 
Y(t)  depends  on  the  number  of  possible  xm.  Thus  classical 
robust  control  only  changes  the  size  of  induced  uncertainty 
ball,  but  not  its  existence. 

Less  conservative  condition  follows  from  using  different 
additional  control  algebraic  form.  Instead  of  Aur  from 
eqn(4),  let 


Au/m  =  - 


p(BTB)-1BTPx 
a\\Px\\  +  ef(t) 


(6) 


with  a  <  1,  and  "driving"  function  f(t)  to  be  discussed 
later.  V(t)  is  bounded  above  by  the  solution  of  majorant 
equation 

™=-\Y  +  2^pf(i)  (7) 


a 


where  f(t)  is  determined  such  that,  for  given  functional 
form  of  p(Y,  t),  the  solution  Y(t)  of  eqn(7)  exhibits  a  pre- 
scribed time  dependence  Yd(t).  Note  that  with  f(t)  much 
larger  freedom  exists  now  than  in  eqn(5)  with  only  num- 
ber e,  and  simple  calculation  would  give  f(t)  for  prescribed 
Yd(t)  from  eqn(7).  However  it  is  more  appropriate  in  ro- 
bustness context  to  solve  an  embedding  problem  where  a 
correspondance  is  researched  between  function  spaces  T 
and  y  for  various  possible  functional  forms  of  p(Y,  t),  with 
f(t)  G  T  and  Y  £  y,  and  where  y  is  fixed  by  general  pro- 
perties such  as  continuity  and  behavior  (decay)  for  large 
t.  Usual  approach  is  to  determine  y  from  T  and  p(Y,  t) 
via  Fixed  Point  Theorem.  Specific  applications  are  obtai- 
ned when  y  is  usual  Lebesgue  £p  or  Sobolev  W£  space 
and  p(Y,  t)  ranges  in  a  similar  space,  as  when  p(Y(t),t) 
satisfies  Caratheodory  condition[10] 


p(Y(t),t)<a(t)  +  b(t)Y(ty 


(8) 


with  a(t)  and  b(t)  in  adapted  spaces,  by  use  of  Holder  in- 
equality onto  eqn(7)  to  determine  the  order  of  J7,  ie  the 
power  q  in  /(<)  =  kiY(t)9 ,  chosen  to  eliminate  control 
explicit  time  dependence.  Eqn(8)  follows  from  thermody- 
namic properties  in  natural  systems,  and  is  obtained  for 
systems  with  real,  physical  or  technical,  origin.  Asympto- 
tic stability  is  now  obtained  for  a  large  class  of  functions 
p(Y(t),t),  in  contrast  to  previous  robust  result,  but  with 
decay  weaker  than  exponential. 

When  inequality  p(Y(t)exppt,t)  <  p(Y(i))h(t)  holds  with 
nondecreasing  function  /?(.),  one  gets 

Y(f)  <  exP-pt.G-1{G{Y0)  +  2sgn(G)I(t))  (9) 

with  I(t)  =  (c/a)fiexpirt'  f(t')h{t')dt'  where  G(x)  = 
fx  dv/p(v).  When  G(.)  is  sublinear,  sgn(G(.))  >  0,  and 
Y(t)  is  defined  for  any  [Y(0),t],  whereas  if  G(.)  is  super- 
linear,  sgn(G(.))  <  0,  and  Y(t)  is  only  defined  for  Y(0)  G 
[0,  Yc]  such  that  the  argument  of  G~l  does  not  change  sign 
for  t  <  +00,  ie  G(Y0)  =  2^  /0°°  exppt'f(t')h(t')dt'  in  order 
to  avoid  finite  time  Lagrange  instability.  Here  Y(t)  may 
decay  faster  than  exponential  but  in  a  limited  ball  of  initial 
conditions  depending  on  /(<).  When  p.  =  0,h(t)  =  1,  Y(t) 
in  eqn(7)  is  bounded  by  —  Lp{u)uq]~1du  =  —Xt  in 

implicit  form  with  L  =  (2ek\/aX),  and  decays  asymptoti- 
cally to  0  for  large  t  under  the  simple  explicit  conditions 


q  >  1  >  L-p{Y0)Y*-1 


(10) 
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relating  system  and  control  parameters  for  given  initial 
ball  Yb,  best  satisfied  for  large  linear  gains  and  small  initial 
ball.  In  all  cases,  with  control  of  generic  canonical  "triplet" 
form  U  =  Ud  +  uo  +  Au,  asymptotic  stability  can  be  found 
in  conjunction  with  robustness  constraint,  in  contrast  to 
classical  approach,  and  the  role  of  f(t)  which  drives  the 
behavior  of  Y(t)  is  clearly  shown. 

IV.  EXPLICIT  FUZZY  CONTROL 

To  escape  from  detailed  dynamical  system  description  ano- 
ther way  would  be  to  set  conditional  "linguistic"  state- 
ments in  the  form  of  if-then  rules[ll]  allowing  to  directly 
pass  to  generic  descriptive  representation  of  system  tra- 
jectories in  geometric  space.  However  this  is  only  possible 
when  their  qualitative  nature  is  already  known,  which  rests 
upon  the  correspondance  between  the  minimum  number  of 
rules  in  rule  space  (71)  and  types  of  trajectories  in  geome- 
tric space  (T),  only  known  in  very  few  cases,  despite  a 
global  relation  from  universal  approximator  property [12]. 
One  case  corresponds  to  simple  "driving  car"  rules[13],  and 
applies  for  focus-type  (or  node-type)  trajectories,  produ- 
ced at  least  locally  by  lagrangian  equations[14].  Again  in 
this  case,  combinatorics  explosion  for  large  enough  dimen- 
sion number  renders  this  approach  useless,  indicating  that 
the  variables  on  which  control  method  is  applied  are  not 
good  ones  because  too  much  detailed,  suggesting  direct 
search  of  explicit  final  defuzzyfied  control  law  to  avoid  all 
intermediate  steps. 

Consider  for  simplification  fuzzyfication  under  translatio- 
nally  invariant  membership  functions  fi(x)  =  f(x  —  iAi) 
over  normalized  interval  and  such  that  /j_i(Ai) 
=  1,  /»(Ai)  =  0,  guaranteeing  always  only  two  non- 
zero membership  values  m,_i  and  m,-  for  any  0  <  x  < 
Ai,  see  Fig. 2.  Similarly  after  usual  max-min  compositio- 
nal rules  of  inference,  defuzzyfication  by  centroid  method 
with  translationally  invariant  output  membership  func- 
tions gj(y)  =  g(y—jA2)  gives  analytic  crisp  output  control 
u  =  upd  +  Aupuz  with  usual  PD  part  upd  =  l/2[t/fc— l  + 
j/t]  in  normalized  form  and  fuzzy  contribution  Aupuz  = 
(A2/2)(l+.4fc_i/.4jfc)~1  with  domains  Ak-i  and  Ak  shown 
on  Fig. 2  and 

Ak-i  _  ™.i[zi  +Inf(y0,zi)  +  2Sup(701,0)] 
Ak    ~  In  +  z9(A2y)\H  {  > 

renormalizing  y  to  A2,  with  y(m;  )  =  yj,  zj  =  1  —  j/y, 
Ijk  =  f£k  g(A2y,  and  intersection  point  j/o  =  1/2  bet- 
ween two  contiguous  membership  functions  corresponding 
to  g(A2y0)  =  rn0  and  g(A2y(mk))  =  g(A2)mk  writing 
mjt_i  =  mi,  mk  =  m2  with  translational  invariance  pro- 
perty of  membership  functions.  Note  that  Fk(mk,0)  —  1 
and  Fk{m,m)  =  0.  With  the  constraint  rrik-i  +  = 
mi  +  m2  =  1,  and  f(A\x)  =  m  from  input  with  nor- 
malized x,  Fk(l  —  m,m)  can  be  thoroughly  studied  for 
m  €  [0, 1/2]  with  slopes  -2(/„1  g(A2t)dt)~:  at  m  =  0  and 


0  at  m  =  1/2.  With  for  instance  generic  membership  func- 
tion g(y)  =  ya  and  a  >  0,  the  steepening  increases  with  a 
ie  with  pitching. 

So  one  may  only  use  the  final  explicit  control  expression  in 
eqn(ll)  once  system  variables  are  properly  located  in  their 
membership  domains.  This  only  requires  inverse  func- 
tions X  =  /-1(m)  and  Y  =  <7-1(m),  but  the  problem 
still  lies  in  evaluation  of  all  system  coordinates  with  flags 
when  crossing  membership  domain.  Here  the  advantage 
in  unpreciseness  is  unfavorably  balanced  by  the  number 
of  variables  to  treat,  showing  again  that  high  quality  of 
fuzzy  controller  is  not  fully  exploited  at  this  too  detailed 
description  level,  as  compared  to  very  global  and  limited 
knowledge  required  in  robust  control  given  in  eqn(6). 

V.  APPLICATION  TO  N-LINK  MECHA 
NICAL  SYSTEM 

Canonical  error  equations  for  actuated  mechanical  systems 
with  compliance  at  joints  and  flexion  and  torsion  link  de- 
formations can  be  exactly  cast  into  the  form[15]  of  eqn(2) 
with  x  =  col(e,de/dt,  Z,dZ/dt),  jj  =  00/(771, 
772),  Au  =  co/( Auo,  Aui),  A  =  Ao  —  BK  and  error  terms 
with  regressors  associated  to  both  system  parameters  and 
deformations  parameters  and  obtained  as  difference  bet- 
ween exact  and  estimated  quantities  in  initial  system.  Ap- 
plication of  control  Aupuz  is  only  possible  here  if  all 
elements  in  column  vector  x  are  evaluated  which  is  dif- 
ficult for  the  last  component.  On  the  contrary,  from  ana- 
lysis of  previous  expressions  one  gets  the  bound  ||77j||  < 
Aj\\\X\\\  +  Bj  =  pj  of  Caratheodory  form  allowing 
direct  application  of  eqn(10).  With  f(t)  =  asym- 
ptotic stability  of  is  obtained  if  the  conditions 

?>1.    l>L(l  +  (3\\X(0)\\)\\Xm\9~l  (12) 

are  satisfied  by  proper  choice  of  parameters  a,  e,  k,  q 
with  L  =  2eak/\a,  f3  =  b/a,  and  a,  b  >  0  obtained  from 
Aj,  Bj  and  supposed  constant  in  time  with  appropriate 
upperbounding.  So  for  A^-link  mechanical  system  a  large 
class  of  decaying  driving  functions  is  found,  which  monitor 
system  decay  guaranteeing  both  robustness  and  asympto- 
tic stability  with  only  knowledge  of  basic  skeleton  parame- 
ters and  error  ball  for  uncorrectly  estimated  parameters, 
in  contrast  to  usual  robust  analysis  showing  simple  sta- 
bility property,  and  to  other  control  approaches  requiring 
more  detailed  knowledge  of  system  parameters.  Present 
better  result  is  due  to  taking  full  advantage  of  Caratheo- 
dory type  constraint  and  to  less  stringent  conditions  on  the 
majorant  of  error  vector  norm,  used  as  a  Lyapounov  func- 
tion. Lyapounov  conditions  for  asymptotic  stability  are 
not  satisfied,  but  the  majorant  still  fulfills  (weaker)  condi- 
tions leading  to  asymptotic  (not  necessarily  exponential) 
decay.  Numerical  application  to  a  3-dof  is  given  in  [16]. 

VI.  CONCLUSION 
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Error  equations  for  dynamical  system  equations  are  cano- 
nically  reduced  to  a  generic  form  composed  of  a  (usually 
linearized)  part  coming  from  linear  known  terms,  and  a 
residual  nonlinear  one  coming  from  unavoidable  mismatch 
between  system  representation  and  its  actual  (unknown) 
form,  both  at  parametric  level  and  at  functional  form  of 
functions  modeling  physical  phenomena  (friction,  contact 
interactions).  Control  should  give  for  solution  of  error 
equations  best  convergence  to  origin.  A  first  component 
from  linearized  part  is  of  PD  type,  usually  giving  locally 
weakly  damped  oscillatory  behavior  from  power  limitati- 
ons restricting  allowed  gains.  Resulting  asymptotic  con- 
vergence is  not  structurally  stable  because  residual  part, 
when  norm-bounded,  leads  to  simple  stability  of  error  sy- 
stem solution.  So  an  additional  controller  is  needed  for 
further  improvement.  Simplest  form  is  a  robust  one  re- 
quiring the  knowledge  of  norm-bound,  classically  leading 
to  simple  stability  result  because  of  unmaintainable  con- 
dition for  all  time  that  error  norm  vector  is  majorized  by 
exponentially  decaying  function. 

New  form  of  robust  controller  allowing  larger  functional 
space  results  from  introduction  of  a  driving  function  f(t), 
whose  proper  design  guarantees  asymptotic  decay  depen- 
ding on  functional  form  of  norm  bound,  even  when  Lyapou 
nov  conditions  are  not  satisfied. The  controller  combines 
simplicity  of  previous  conventional  robust  controller  requi- 
ring only  information  on  norm  bound  of  residual  terms, 
with  efficiency  and  preciseness  of  trajectory  tracking  by 
maintaining  structurally  stable  asymptotic  convergence. 
So  here  results  are  in  real  systems  obtained  at  lowest  le- 
vel of  mechanical  structure.  Contrary  to  fuzzy  controller 
even  in  explicit  final  form  after  application  of  logical  ru- 
les demanding  evaluation  of  complete  system  state  vector, 
very  few  informations  are  required  in  new  proposed  robust 
controller,  so  its  description  at  higher  level  is  extremely 
simplified,  easing  the  role  of  higher  loop  decisional  fuzzy 
type  adapted  controllers[14]. 
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Fig. 2  :  Membership  Functions  and  Areas  Ai 

Domain(A,  B,  C,  D),  A2  =  Domain(E,  F,  G,  C) 
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Abstract. 

This  paper  is  an  effort  to  address  the  issues  of 
planning  and  learning  as  two  components  of  the  same 
process  of  intelligent  system  functioning. 

Key  words:  behavior,  goal,  path, 
planning,  redundancy,  resolution,  representation, 
tessellatum,  trajectory 

1.  Planning  Problems  of  Behavior 
Generation 

Behavior  Generation  [1]  can  have  many 
mechanisms  of  planning  and  execution.  At  the 
present  time,  these  mechanisms  cannot  be 
considered  as  thoroughly  known,  and  the  general 
theory  of  planning  can  hardly  be  attempted.  We  will 
discuss  a  subset  of  problems  in  which  the  goal  is 
defined  as  attainment  of  a  particular  state.  Other 
types  of  problems  can  also  be  imagined:  in  chess 
the  goal  is  clear  (to  win)  but  this  goal  cannot  be 
achieved  by  achieving  a  particular  position  in  a 
space  (even  in  a  descriptive  space.)  Most  of  the 
problems  related  to  the  theory  of  games  and  linked 
with  pursuit  and  evasion  are  characterized  by  a 
similar  predicament  and  are  not  discussed  here. 

Planning  is  understood  as  searching  for 
appropriate  future  trajectories  of  motion  leading  to 
the  goal.  Searching  is  performed  within  the  system 
of  representation. 

2.  Planning  in  a  Representation 
Space  with  a  Given  Goal 

The  world  is  assumed  to  be  judged  upon  by 
using  its  Space  of  Representation  which  is 
interpreted  as  a  vector  space  with  a  number  of 
properties.  Any  activity  (motion)  in  the  World 
(Space  of  Representation)  can  be  characterized  by  a 
trajectory  of  motion  along  which  the  "working 
point"  or  "present  state"  (PS)  is  traversing  this 
space  from  one  point  (initial,  or  state,  IS)  to  one  or 
many  other  states  (goal  states,  GS.)  The  goal  states 
are  given  initially  from  the  external  source  as  a 
"goal  region",  or  a  "goal  subspace"  in  which  the 
goal  state  is  not  completely  defined  in  a  general 


case.  One  of  the  stages  of  planning  (often  the  initial 
one)  is  defining  where  exactly  is  the  GS  within  the 
"goal  region."  In  this  paper,  we  will  focus  upon 
planning  problems  in  which  one  or  many  GS 
remain  unchanged  through  all  period  of  their 
achievement.  Traversing  from  IS  to  GS  is 
associated  with  consuming  time,  or  another 
commodity  (cost.) 


6i  1  1  1  1  1  1  1  j  r 


Figure  1.  The  general  paradigm  of  planning 


3.  Learnable  Representations 

All  Representation  Spaces  are  acquired 
from  the  external  reality  by  the  processes  of 
Learning.  Many  types  of  learning  are  mentioned  in 
the  literature  (supervised,  unsupervised, 
reinforcement,  dynamic,  PAC,  etc.)  Before 
classifying  a  need  in  a  particular  method  of  learning 
and  deciding  how  to  learn,  we  would  like  to  figure 
out  what  should  we  learn.  Now,  it  is  not  clear 
whether  the  process  of  learning  can  be  separated  into 
two  different  learning  processes: 

•  that  of  objects  representation,  and 

•  that  of  the  rules  of  action  representation, 
or  are  these  two  kinds  of  learning  just  two  sides  of 
the  same  core  learning  process? 

The  following  knowledge  should  be 
contained  in  the  Representation  Space.  If  no  GS  is 
given,  any  pair  of  state  representations  should 
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contain  implicitly  the  rule  of  moving  from  one 
state  to  another.  In  this  case,  while  learning  we 
inadvertently  consider  any  second  state  as  a 
provisional  GS. 

We  will  call  "proper"  representation  a 
representation  similar  to  the  mathematical  function 
and/or  field  description:  at  any  point  of  the  space, 
the  derivative  is  available  together  with  the  value  of 
the  function;  the  derivative  can  be  considered  an 
action  required  to  produce  the  change  in  the  value  of 
the  function. 

We  will  call  "goal  oriented"  representation 
a  representation  in  which  at  each  point  a  value  of 
the  action  is  given  required  for  describing  not  the 
best  way  of  achieving  an  adjacent  point  but  the  best 
way  of  achieving  the  final  goal. 

Both  "proper"  and  "goal  oriented" 
representation  can  be  transformed  in  each  other. 
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Figure  2.  Knowledge  of  state,  and  knowledge  of 
action  which  produces  changes 

4.  The  Artifacts  of  Representation 
Space 

Representation  (that  of  the  World)  can  be 
characterized  by  the  following  artifacts: 

•  existence  of  states  with  its  boundaries 
determined  by  the  resolution  of  the  space  each  state 
is  presented  as  a  tessellatum  [2],  or  an  elementary 
unit  of  representation,  the  lowest  possible  bounds 
of  attention) 

•  characteristics  of  the  tessellatum  which  is 
defined  as  an  indistinguishability  zone  (we  consider 
that  resolution  of  the  space  shows  how  far  the 
"adjacent"  tessellata  (states)  are  located  from  the 
"present  state"  (PS) 

•  lists  of  coordinate  values  at  a  particular 
tessellatum  in  space  and  time 


•  lists  of  actions  to  be  applied  at  a  particular 
tessellatum  in  space  and  time  order  to  achieve  a 
selected  adjacent  tessellatum  in  space  and  time 

•  existence  of  strings  of  states  intermingled  with 
the  strings  of  actions  to  receive  next  consecutive 
tessellata  of  these  strings  of  states 

•  boundaries  (the  largest  possible  bounds  of  the 
space)  and  obstacles 

•  costs  of  traversing  from  a  state  to  a  state  and 
through  strings  of  states. 

In  many  cases,  the  states  contain 
information  which  pertains  to  the  part  of  the  world 
which  is  beyond  our  ability  to  control  it,  and  this 
part  is  called  "environment."  Another  part  of  the 
world  is  to  be  controlled:  this  is  the  system  for 
which  the  planning  is  to  be  performed.  We  will 
refer  to  it  frequently  as  "self."  Thus,  part  of  the 
representation  is  related  to  "self  including 
knowledge  about  actions  which  this  "self  should 
undertake  in  order  to  traverse  the  environment. 

It  is  seen  from  the  list  of  artifacts  that  all 
knowledge  is  represented  at  a  particular  resolution. 
Thus,  the  same  reality  can  be  represented  at  many 
resolutions  and  the  "multiresolutional 
representation"  is  presumed. 
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Figure  3.  Multiresolutional  representation 

5.  Planning  in  Redundant  Systems 

Non-redundant  systems  have  a  unique 
trajectory  of  motion  from  a  state  to  a  state. 
Redundant  system  is  defined  as  a  system  in  which 
there  is  more  than  one  trajectory  of  motion  from 
one  state  to  another.  It  can  be  demonstrated  for 
many  realistic  couples  "system-environment"  that 

•  they  have  a  multiplicity  of  traversing 
trajectories  from  a  IS  to 
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Figure  4.  Multiplicity  of  plan  alternatives 


•  these  trajectories  can  have  different  costs. 

These  systems  contain  a  multiplicity 
of     alternatives      of     space  traversal. 

Redundancy  grows  when  the  system  is  considered  to 
be  a  stochastic  one.  The  number  of  available 
alternatives  grows  even  higher  when  we  consider 
also  a  multiplicity  of  goal  tessellata  of  a  particular 
level  of  resolution  under  the  condition  of  assigning 
the  goal  at  a  lower  resolution  level  which  is  the  fact 
in  multiresolutional  systems  (such  as  NIST-RCS.) 

In  on-redundant  systems  there  is  no 
problem  of  planning.  Since  the  trajectory  of  motion 
to  be  executed  is  a  unique  one,  the  problem  is  to 
find  this  trajectory  and  to  provide  tracking  of  it  by 
an  appropriate  classical  control  system. 

6.  Learning  as  a  Source  of 
Representation 

Learning  is  defined  as  knowledge 
acquisition  via  experience  of  functioning.  Thus, 
learning  is  development  and  enhancement  of  the 
representation  space.  The  latter  can  be  characterized 
in  the  following  ways: 

•  by  a  set  of  paths  (to  one  or  more  goals) 
previously  traversed 

•  by  a  set  of  paths  (to  one  or  more  goals) 
previously  found  and  traversed 

•  by  a  set  of  paths  (to  one  or  more  goals) 
previously  found  and  not  traversed 

•  by  a  totality  of  (all  possible)  paths 

•  by  a  set  of  paths  executed  in  the  space  in 
a  random  way. 

One  can  see  that  this  knowledge 
contains  implicitly  both  the  description  of  the 
environment  and  the  description  of  the  actions 
required  to  traverse  a  trajectory  in  this  environment. 


Moreover,  if  some  particular  system  is  the  source  of 
knowledge,  then  the  collected  knowledge  contains 
information  about 

properties  of  the  system  which  moved  in  the 
environment. 

All  this  information  arrives  in  the  form  of 
experiences  which  record  states,  actions  between 
each  couple  of  states,  and  evaluation  of  the 
outcome.  The  collection  of  information  obtained  in 
one  or  several  of  these  ways  forms  knowledge  of 
space,  KS. 

If  the  information  base  contains  all 
tessellata  of  the  space  with  all  costs  among  the 
adjacent  tessellata  -  we  usually  call  it  the 
representation. 

Ergo:  the  representation  is  equivalent  to 
the  multiplicity  of  explanations  how  to  traverse,  or 
how  to  move.  In  other  words:  all  kinds  of  learning 
mentioned  in  p.  3  are  equivalent. 

Comments:  a)  remember  Albus'  question 
about  knowing  states,  or  knowing  derivatives" 
(actions)  from  a  state  to  a  state;  b)  apparently,  each 
state  can  be  characterized  by  some  cumulative  cost 
(value),  while  each  traversal  from  a  state  to  a  state 
can  be  characterized  by  some  incremental  cost 
{goodness  of  a  move  or  a  set  of  moves.) 

7.  Types  of  Problems  of  Planning 

Any  problem  of  planning  is  associated 

with 

•  actual  existence  of  the  present  state 

•  actual,  or  potential  existence  of  the  goal  state 

•  knowledge  of  the  values  for  all  or  part  of  the 
states  as  far  as  some  particular  goal  is  concerned. 

From  this  knowledge  the  cumulative  costs 
of  trajectories  to  a  particular  goal  (or  goals)  can  be 
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deduced.  On  the  other  hand,  the  knowledge  of  costs 
for  the  many  trajectories  traversed  in  the  past  can  be 
obtained  which  is  equivalent  to  knowing  cumulative 
costs  from  the  initial  state  (PS)  to  the  goal  state 
(GS)  (from  which  the  values  of  the  states  can  be 
deduced.) 

In  other  words,  any  problem  of  planning 
contains  two  components:  the  first  one  is  to  refine 
the  goal  (bring  it  to  the  higher  resolution.)  The 
second  one  is  to  determine  the  path  to  this  refined 
goal.  These  two  parts  can  be  performed  together,  or 
separately.  Frequently  we  are  dealing  with  them 
separately.  In  the  latter  case  they  are  formulated  as 
follows: 

a)  given  PS,  GS  and  KS  (all  paths)  find 
the  subset  of  KS  with  a  minimum  cost,  or  with 
a  prearranged  cost,  or  with  a  cost  in  a  particular 
interval. 

b)  given  PS  Gs  from  the  lower  resolution 
level  and  KS  (all  paths)  find  the  GS  with  a 
particular  value 

8.  The  Duality  of  Planning  and 
Learning  Algorithms 

Finding  solutions  for  these  problems  is 
done  by  a  process  which  we  will  call  planning.  In 
other  words,  planning  is  construction  of  the  goal 
states,  and/or  strings  of  states  connecting  the 
present  state  with  the  goal  states.  As  we  analyze  the 
desirable  processes  of  planning,  we  notice  a  striking 
similarity  between  planning  and  learning,  actually 
their  inseparability. 

O  GS 


Submitted  to  HR 
level  as  a  result 


Where  is  the  goal  for  the  HR  level?  Which  trajectory  to  choose? 


a  b 

Figure  5.  Two  parts  of  planning  problem 

Indeed,  the  first  component  of  the  planning 
algorithm  is  translation  of  the  goal  state 
description  from  the  language  of  low  resolution  to 
the  level  of  high  resolution.  We  must  learn  where 
the  goal  is  located,  and  it  is  done  by  consecutive 


refinement  of  the  initial  coarse  information. 
Frequently,  it  is  associated  with  increasing  of  the 
total  number  of  the  state  variables.  In  all  cases  it  is 
associated  with  reduction  of  the  indistinguishability 
zone,  or  the  size  of  the  tessellatum  associated  with  a 
particular  variable.  We  plan  and  learn  by  testing:  in 
the  representation,  for  planning,  and  in  the  reality, 
for  learning. 

The  second  component  is  the  simulation  of 
all  available  alternatives  of  the  motion  from  the 
initial  state,  IS  to  one  or  several  goal  states,  GS  and 
selection  of  the  "best"  trajectory.  Procedurally,  this 
simulation  is  performed  as  a  search,  i.e.  via 
combinatorial  construction  of  all  possible  strings 
(groups).  To  make  this  combinatorial  search  for  a 
desirable  group  more  efficient  we  reduce  the  space  of 
searching  by  focusing  attention. 

The  need  in  planning  is  determined  by  the 
multialternative  character  of  the  reality  The  process 
of  planning  can  be  made  more  efficient  by  using 
appropriate  heuristics  which  is  not  considered  in 
this  paper. 

9.  The  Unified  System  of  Planning 
and  Learning 

Planning  is  performed  by  searching  within 
a  limited  subspace 

•  for  a  state  with  a  particular  value 
(designing  the  goal) 

•  for  a  string  (a  group)  of  states 
connecting  SP  and  GP  satisfying  some  conditions 
on  the  cumulative  cost  (planning  of  the  course  of 
actions.) 

The  process  of  searching  is  associated  ether 
with  collection  of  addition  information  about 
experiences,  or  with  extracting  from  KS  the 
implicit  information  about  the  state  and  moving 
from  state  to  state,  or  learning.  In  other  words, 
planning  is  inseparable  from  and 
complementary  to  learning. 

This  unified  planning/learning  process  is 
always  oriented  toward  improvement  of  functioning 
in  engineering  systems  (improvement  of  accuracy  in 
an  adaptive  controller)  and/or  toward  increasing  of 
probability  of  survival  (emergence  of  the  advanced 
viruses  for  the  known  diseases  that  can  resist 
various  medications,  e.g.  antibiotics.) 

Thus,  this  joint  process  can  be  related  to  a 
system  as  well  as  to  populations  of  systems  and 
determines  their  evolution. 

LPA  is  a  tool  which  allows  for  jointly 
exploring  these  two  fundamental  processes  of 
intelligent  systems. 
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Figure  6.  On  the  relations  between  planning  and 
learning 

10.  The  Elementary  Components  of 
Planning  and  Learning 

Search  is  performed  by  constructing 
feasible  combinations  of  the  states  within  a 
subspace  (feasible,  means:  satisfying  a  particular  set 
of  conditions.)  Search  is  interpreted  as  exploring 
(physically,  or  in  simulation)  as  many  as  possible 
alternatives  of  possible  motion  and  comparing  them 
afterwards. 

Each  alternative  is  created  by  using  a 
particular  law  of  producing  the  group  of  interest 
(cluster,  string,  etc.)  Usually,  grouping  presumes 
exploratory  construction  of  possible  combinations 
of  the  elements  of  space  (combinatorial  search)  and 
as  one  or  many  of  these  combinations  satisfy 
conditions  of  "being  an  entity"  -  substitution  of  this 
group  by  a  new  symbol  with  subsequent  treating  it 
as  an  object  (grouping.) 

The  larger  the  space  of  search  is  the  higher 
is  the  complexity  of  search.  This  is  why  a  special 
effort  is  allocated  with  reducing  the  space  of  search. 
This  effort  is  called  focusing  attention  and  it  results 
in  determining  two  conditions  of  searching,  namely, 
its  upper  and  lower  boundaries: 

a)  the  upper  boundaries  of  the  space  in 
which  the  search  should  be  performed,  and 

b)  the  resolution  of  representation  (the 
lower  boundaries) 

11.  Planning  and  Intelligence 

Formation  of  multiple  combinations  of 
elements  (combinatorial  search,  CS)  satisfying 
required  conditions  of  transforming  them  into 
entities  (grouping,  G)  within  a  bounded  subspace 
(focusing  attention,  FA)  is  a  fundamental  procedure 
in  both  learning  and  planning.  Since  these  three 


procedures  work  together  we  will  talk  about  them  as 
about  a  triplet  of  computational  procedures 
(GFACS.)  Notice,  that  in  learning  it  creates  lower 
resolution  levels  out  of  higher  resolution  levels 
(bottom-up)  while  in  planning  it  progresses  from 
the  lower  resolution  levels  out  of  higher  resolution 
levels  (top-down.) 

This  triplet  of  computational  procedures  is 
characteristic  for  intelligence  and  probably  is  the 
elementary  computational  unit  of  intelligence.  Its 
purpose  is  transformation  of  large  volumes  of 
information  into  a  manageable  form  which  ensures 
success  of  functioning.  The  way  it  functions  in  a 
joint  learning-planning  process  explains  the 
pervasive  character  of  hierarchical  architectures  in  all 
domains  of  activities. 
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Figure  7.  Functioning  of  GFACS  in  the  joint 
learning-planning  process 

The  need  in  GFACS  is  stimulated  by  the 
property  of  knowledge  representations  to  contain  a 
multiplicity  of  alternatives  of  space  traversal  (which 
is  a  property  of  representations  to  be  redundant.) 
Redundancy  of  representations  determines  the  need 
in  GFACS:  otherwise  the  known  systems  would 
not  be  able  to  function  efficiently  (it  is  possible 
that  redundancy  of  representations  is  a  precondition 
for  the  possibility  of  Life  and  the  need  in 
Intelligence) 

12.  Planning,  Learning,  and  Control 
Theory 

Representations  reduce  the  redundancy  of 
reality.  Elimination  of  redundancy  allows  for  having 
problems  that  can  be  solved  in  a  closed  form  (no 
combinatorics  is  possible  and/or  necessary). 
Sometimes,  this  ultimate  reduction  of  redundancy  is 
impossible  and  the  combinatorial  search  is  the  only 
way  of  solving  the  problem).  If  the  problem  cannot 
be  solved  in  a  closed  form,  we  introduce  redundancy 
intentionally  to  enable  functioning  of  GFACS. 
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At  each  level  of  resolution,  planning  is 
done  as  a  reaction  for  the  slow  changes  in  situation 
which  invokes  the  need  in  anticipation  and  active 
interference 

a)  to  take  advantage  of  the  growing 
opportunities,  or 

b)  to  take  necessary  measures  before  the 
negative  consequences  occur. 

The  deviations  from  a  plan  are 
compensated  for  by  the  compensatory  mechanism 
also  in  a  reactive  manner.  Thus,  both  feedforward 
control  (planning)  and  feedback  compensation  are 
reactive  activities  as  far  as  interaction  system- 
environment  is  concerned.  Both  can  be  made  active 
in  their  implementation.  This  explains  different 
approaches  in  control  theory. 
Examples:  a)  Classical  control  systems  are  systems 
with  no  redundancy,  they  can  be  solved  in  a  closed 
form.  Thus,  they  do  not  require  any  searching. 

b)  Any  stochastics  introduced  to  a  control  system 
creates  redundancy  and  requires  either  for  elimination 
of  redundancy  and  bringing  the  solution  to  a  closed 
form,  or  performing  search. 

c)  Optimum  control  allows  for  the  degree  of 
redundancy  which  determines  the  need  in  searching. 

In  Figure  8,  the  process  of 
multiresolutional  planning  via  consecutive  search 
with  focusing  attention  and  grouping  is 
demonstrated  for  the  control  problem  of  finding  a 
minimum-time  motion  trajectory. 

The  space  is  learned  in  advance  by  multiple 
testing,  and  its  representation  is  based  upon 
knowing  that  the  distance,  velocity  and  time  are 
linked  by  a  simple  expression  which  is  sufficient 
for  obtaining  computationally  the  theoretically 
correct  solution  with  an  error  accepted  to  be 
admissible.  Several  methods  of  construbting  the 
envelopes  of  attention  can  be  applied. 
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Introduction 

In  common  usage,  a  road  map  is  a  piece  of 
paper  with  lines  that  indicate  roads  and  dots  that 
represent  cities.  More  technically,  a  road  map  is 
an  iconic  representation  of  the  surface  of  the  earth 
with  roads  represented  as  lines,  and  destinations 
(or  goals)  represented  as  points,  or  regions  on  the 
map.  In  the  context  of  the  title  of  this  paper,  a 
road  map  implies  a  planned  route  on  the  map 
from  our  present  position  to  some  future  goal. 

What  is  the  goal  for  intelligent  systems  and 
semiotics?  What  should  it  be?  What  could  it 
be? 

In  order  to  decide  on  a  goal,  one  must 
choose  a  cost/benefit  function.  What  are  the 
factors  that  go  into  a  cost/benefit  function? 
What  is  important?  Important  for  what? 

I  will  argue  that  intelligent  systems  and 
semiotics  are  important  for  science,  economic 
prosperity,  military  strength,  and  human  well 
being. 

Intelligent  systems  and  semiotics  are 
important  to  science  because  they  extend  human 
knowledge  into  a  rich  and  largely  uncharted 
region  --  the  mind.  Intelligent  systems  theory 
has  much  to  contribute  to  a  fundamental 
understanding  of  the  mind  and  of  the  nature  of 
perception,  cognition,  and  behavior. 

Intelligent  systems  and  semiotics  are 
important  to  economic  prosperity  because  they 
enable  productivity  improvements  in  the 
production  of  wealth.  Intelligent  systems 
applications  have  the  potential  to  reduce  the  cost 


and  improve  the  quality  of  almost  every  kind  of 
product  and  service. 

Intelligent  systems  and  semiotics  are 
important  to  military  strength  because  they  will 
enable  a  whole  new  generation  of  battlefield 
sensors  and  weapons  that  will  revolutionize  the 
art  of  war.  Information  dominance  and  unmanned 
weapons  systems  have  the  capacity  to  increase 
the  effectiveness  and  reduce  the  risk  of  projecting 
military  power  to  trouble  spots  throughout  in  the 
world. 

Intelligent  systems  and  semiotics  are 
important  to  human  well  being  because  they  will 
enable  us  to  clean  up  the  environment,  to  switch 
to  more  efficient  means  of  production,  and  to 
improve  health  care,  education,  public  safety,  and 
personal  security  for  everyone.  In  short, 
intelligent  systems  have  the  potential  to  provide 
the  means  to  end  poverty  and  bring  about  a 
golden  age  for  human  kind. 

Important  for  Science 

I  would  argue  that  all  of  science  can  be 
summed  up  in  three  great  questions: 

1 .  What  is  the  nature  of  matter? 

2.  What  is  the  nature  of  life? 

3.  What  is  the  nature  of  mind? 

Over  the  past  200  years,  research  in  the 
physical  sciences  has  produced  a  wealth  of 
knowledge  about  the  nature  of  matter,  both  on 
our  own  planet  and  in  the  distant  galaxies.  We 
now  understand  the  forces  and  particles  that  make 
up  atoms  and  their  constituent  parts:  what  holds 
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them  together,  and  what  gives  them  their 
properties. 

Over  the  past  half  century,  the  biological 
sciences  have  produced  a  revolution  in  knowledge 
about  the  nature  of  life.  We  have  a  profound 
understanding  of  the  molecular  basis  of  life.  We 
are  well  on  our  way  to  mapping  the  human 
genome.  We  may  soon  understand  how  to  cure 
cancer  and  prevent  AIDS.  We  will  see  the 
development  of  new  drugs  and  new  sources  of 
food.  Within  the  next  century,  we  may  eradicate 
most  genetic  diseases. 

However,  of  all  the  questions  in  science, 
the  deepest  and  most  profound  may  be  ~  "What 
is  mind?" 

This  question  can  be  asked  in  a  number  of 
different  ways.  From  the  time  of  Moses, 
prophets  and  priests  have  asked,  "What  is  the 
nature  of  the  soul?"  Since  the  time  of  Aristotle, 
philosophers  and  theologians  have  asked,  "What 
is  the  relationship  between  mind  and  body?"  For 
at  least  a  century,  psychologists  have  asked, 
"What  is  a  thought?  What  does  it  mean  to  think 
about  something?  How  do  we  imagine  things?" 
Psychophysicists  ask,  "What  is  perception,  and 
how  is  it  related  to  the  world  that  is  perceived?" 
Psychiatrists  ask,  "What  are  emotions  and 
dreams,  and  why  do  we  have  them?" 
Psychologists  want  to  know,  "What  is 
motivation  and  intention,  and  how  do  we  decide 
what  we  intend  to  do?"  Neurophysiologists  ask, 
"How  do  we  convert  intention  into  action  and 
sensation  into  feeling?"  Researchers  in 
linguistics  and  semiotics  ask,  "How  do  we 
generate  and  use  language  and  how  do  we 
represent  knowledge  about  the  world  in  our 
minds?"  Researchers  in  artificial  intelligence 
and  operations  research  ask,  "How  do  we  plan, 
and  how  do  we  solve  problems  and  reason  about 
the  world?" 

Intelligent  systems  from  a  semiotic 
perspective  is  the  study  of  mind.  The  study  of 
intelligent  systems  addresses  what  is  arguably 
THE  most  important  question  in  science  —  What 
is  the  nature  of  mind? 

But  let  me  move  on  to  the  issue  of  the 
importance  of  Intelligent  Systems  and  Semiotics 
to  economic  prosperity. 


Important  for  Economic  Prosperity 

Intelligent  manufacturing  systems  are 
important  economically  because  they  can 
dramatically  increase  productivity. 

Toffler  has  suggested  that  there  have  been 
three  pivotal  groups  of  inventions  that  shaped  the 
course  of  human  history.  The  first  was  the 
invention  of  agriculture,  which  made  it  possible 
to  augment  hunting  and  gathering  with  the  more 
efficient  and  less  dangerous  processes  of 
cultivation  of  crops  and  husbandry  of  animals. 
This  brought  about  the  development  of  cities,  the 
organization  of  nations,  and  the  rise  of 
civilization. 

The  second  group  of  inventions  was  the 
invention  of  the  steam  engine,  the  discovery  of 
electricity,  and  the  development  of  the  internal 
combustion  engine.  These  made  possible  the 
augmentation  of  muscle  power  with  mechanical 
energy  in  the  production  of  goods  and  services. 
This  caused  the  demise  of  slavery  and  the  rise  of 
middle-class  prosperity. 

Toffler's  Third  Wave  is  based  on  the 
invention  of  electronics  and  the  digital  computer. 
This  is  enabling  the  augmentation  of  brain  power 
with  computer  power  in  the  production  of  goods 
and  services.  Intelligent  systems  have  the 
capacity  to  increase  industrial  productivity  by 
orders  of  magnitude.  This  could  reduce  the  cost 
of  production  and  enable  wealth  to  be  generated  at 
a  rate  sufficient  to  eliminate  both  poverty  and 
pollution  and  create  a  world  of  prosperity  in 
which  everyone  could  be  economically  secure  and 
financially  independent.  Intelligent  machines 
could  become  the  economic  equivalent  of 
mechanical  slaves.  Machine  owners  would 
become  the  modern  equivalent  of  an  aristocracy, 
and  with  the  proper  economic  policy,  everyone 
could  become  an  owner  of  wealth  producing 
machines. 

Let  me  be  more  specific  about  how 
intelligent  machine  and  semiotics  can  be 
important  for  economic  prosperity. 

Manufacturing 

Intelligent  manufacturing  systems  can 
dramatically  reduce  the  cost  and  improve  the 
quality  of  cars,  trucks,  airplanes,  appliances, 
furniture,  clothing,  food  products,  electronics, 
optics,  drugs,  chemicals,  construction  equipment, 
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mining  and  drilling  equipment,  farm  machinery, 
railroads,  ships,  and  weapons  systems. 

For  example,  intelligent  machine 
controllers  based  on  inexpensive  personal 
computer  technology  will  soon  make  it  possible 
to  automatically  generate  programs  for  lifting, 
positioning,  cutting,  joining,  machining, 
forming,  finishing,  and  assembly  operations  in 
factories  and  shops  throughout  the  world.  On- 
line monitoring  of  material  flow  and  machine 
availability  will  enable  real-time  production 
planning  and  scheduling.  Advanced  sensory 
perception  systems  will  enable  on-line  inspection 
and  testing.  Intelligent  adaptive  control  systems 
will  increase  the  efficiency  and  reduce  the  cost  of 
plants,  processes,  and  machines. 

The  manufacturing  industries  produce  $1.1 
trillion  of  goods  and  services  each  year  in  the 
United  States.  Every  percentage  point 
improvement  in  manufacturing  productivity 
produces  over  $10  billion  increase  in  the  nation's 
wealth  production. 

Business  management 

Intelligent  systems  technologies  are 
becoming  increasingly  important  for  business 
management.  Computers  are  already  involved  in 
design,  planning,  scheduling,  word  processing, 
business  management,  financial  services, 
marketing,  and  customer  services. 

Transportation  Safety  and  Efficiency 

Intelligent  systems  technologies  are  about 
to  have  a  revolutionary  impact  on  cars  and 
highways  around  the  world.  For  more  than  a 
decade  in  Europe  and  Japan,  and  now  in  the 
United  States,  serious  efforts  are  being  directed 
toward  Intelligent  Highway  and  Vehicle  Systems. 
In  Germany,  experimental  vision-guided 
automatic  automobiles  are  undergoing  regular 
tests  driving  the  streets  and  highways  in  traffic  at 
normal  speeds,  almost  completely  without 
human  assistance.  A  sedan  recently  drove  from 
Munich  to  Copenhagen  under  automatic  control 
over  95%  of  the  time.  In  Japan,  drivers  can  view 
detailed  maps  of  streets  with  directory  assistance, 
on-line  traffic  information,  and  voice 
input/output.  In  the  U.S.,  an  automobile  under 
computer  control  recently  drove  from  the  east 
coast  to  San  Diego  largely  without  human 
assistance. 


Advanced  cruise  control  with  collision 
avoidance  will  enhance  safety  by  alerting  drivers 
that  have  gone  to  sleep  or  whose  attention  has 
wandered.  Automatic  lane  following  and 
distance  keeping  will  improve  throughput  and 
increase  safety  on  congested  freeways.  This  may 
significantly  reduce  the  need  to  build  additional 
lanes  in  areas  where  construction  is  restricted  by 
the  availability  of  space. 

Intelligent  systems  can  also  improve  airline 
safety  and  prevent  most  rail  and  ship  collisions. 
A  large  percentage  of  accidents  are  caused  by 
human  operator  errors.  Most  of  these  could  be 
prevented  by  intelligent  systems  technology. 

Communications 

Intelligent  systems  technologies  are  already 
having  a  profound  impact  on  the 
communications  industries.  The  internet, 
satellite  and  cable  television,  and  cellular  phone 
systems  are  creating  a  communications  network 
that  rivals  in  bandwidth,  complexity,  and 
sophistication  the  network  of  neural  fibers  that 
interconnect  various  parts  in  the  human  brain. 
The  potential  for  productivity  improvements 
from  these  new  technologies  is  beyond  our 
ability  to  predict. 

Construction 

Studies  of  the  construction  industry  have 
suggested  that  significant  benefits  can  be 
achieved  from  integrating  computer  design  data 
with  on-site  measurements  to  enable  real-time 
planning  and  scheduling  and  inventory  tracking. 
Intelligent  measurement  systems  can  assure  that 
construction  tolerances  are  met  and  parts  fit 
together  without  modification.  Intelligent 
planning  systems  can  assure  that  parts,  tools,  and 
materials  arrive  at  the  right  place,  at  the  right 
time,  and  in  the  right  order  for  construction 
operations  to  flow  seamlessly.  Complete  "as- 
built"  records  can  be  kept  for  future  reference 
during  maintenance,  repair,  and  future 
modifications. 

Intelligent  systems  technology  for 
construction  can  result  in  improved  productivity, 
lower  cost,  and  higher  quality  in  the  construction 
of  factories,  plants,  high  rise  buildings,  homes, 
highways,  bridges,  tunnels,  port  facilities,  sewer, 
water,  electricity,  and  gas  utilities,  and  homes. 
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The  construction  industry  produces  about 
$460  billion  per  year.  Each  percentage  point 
improvement  in  construction  productivity  growth 
rate  produces  a  $4.6  billion  increase  in  the 
nation's  rate  of  wealth  production. 

Waste  Management 

Intelligent  systems  will  enable  improved 
methods  for  toxic  and  radioactive  waste  handling 
and  cleanup  of  waste  dumps.  New  approaches  to 
trash  collection,  recycling,  and  pollution 
monitoring  will  enable  environmental  restoration 
and  preservation. 

Hospital  and  Nursing  Support 

Intelligent  fetch  and  carry  robots  are  already 
used  every  day  in  hospitals.  Computers  are  used 
for  patient  record  keeping,  patient  monitoring, 
and  diagnostic  aids.  Intelligent  systems  can  be 
used  in  laboratory  analysis,  drug  manufacturing, 
and  prescription  filling.  These  applications  could 
reduce  costs  and  improve  care  for  hospital 
patients. 

In-home  Patient  Services 

Intelligent  systems  could  be  developed  for 
lifting  and  positioning  invalid  and  infirm  patients 
in  and  out  of  bed,  on  and  off  of  the  toilet,  and  in 
and  out  of  the  bath.  Intelligent  systems  can 
provide  mobility,  food  preparation,  physical 
therapy,  security  and  health  monitoring, 
telecommuting  for  work  and  shopping,  and 
entertainment  and  education  for  the  home  patient. 
This  can  reduce  the  cost  and  improve  the  life  of 
patients  that  prefer  to  remain  at  home  rather  than 
be  institutionalized  in  nursing  care  facilities. 

For  example,  around  100,000  persons  enter 
nursing  homes  every  month  in  the  United  States 
at  a  cost  of  more  than  $2000  per  month. 
Delaying  the  average  date  of  entry  of  the  elderly 
to  nursing  homes  by  only  one  month  would  save 
the  country  about  $2.4  billion  per  year. 

Physical  Security 

Intelligent  systems  will  enable  advanced 
security  systems  for  the  detection  and  tracking  of 
intruders  with  a  minimum  of  false  alarms. 

Agriculture  and  Food  Processing 

Robotics  and  intelligent  machine  systems 
have  only  begun  to  enter  the  field  of  agriculture 
and  food  processing  industries.   The  application 


of  intelligent  systems  technologies  to  farming 
the  oceans  has  not  even  begun  to  be  explored. 

Mining  and  Drilling 

Three-fifths  of  the  earth's  surface  is  too 
deep  beneath  the  oceans  for  mining  or  drilling 
operations  using  conventional  techniques.  This 
means  that  most  of  the  earth's  mineral  resources 
have  never  been  touched.  Intelligent  undersea 
robots  offer  significant  potential  for  productivity 
improvements  in  deep  sea  mining  and  oil  drilling 
operations. 

Space  and  Undersea  Exploration 

Outer  space,  planetary  surfaces,  and  the 
bottom  of  the  ocean  all  share  the  characteristic 
that  manned  exploration  is  extremely  expensive 
and  hazardous.  Intelligent  systems  and  robotics 
promise  to  decrease  the  cost  and  risk  and  increase 
the  amount  of  data  that  can  be  gathered  from 
space  and  undersea  exploration. 

Important  for  Military  Strength 

Intelligent  systems  technology  promises  to 
revolutionize  warfare  through  a  new  generation  of 
sensors,  machines,  decision  aids,  and  weapons 
systems.  Intelligent  weapons  are  already  highly 
advanced.  Cruise  missiles,  smart  bombs, 
unmanned  vehicles,  and  computer  augmented 
command  and  control  systems  are  currently  being 
developed  and  deployed.  These  are  but  the 
vanguard  of  a  whole  new  generation  of  military 
systems  that  will  become  possible  once 
intelligent  systems  becomes  a  mature 
engineering  discipline. 

In  future  wars,  unmanned  air  vehicles, 
ground  vehicles,  ships,  and  undersea  vehicles  will 
be  able  to  outperform  manned  systems.  Many 
military  vehicles  are  limited  in  performance 
because  of  the  acceleration,  vibration,  or  pressure 
limits  of  the  human  body,  or  the  need  of  humans 
to  consume  air,  water,  and  food.  A  great  deal  of 
the  weight  and  power  of  current  military  vehicles 
is  spent  on  armor  and  life  support  systems  that 
are  necessary  to  protect  and  maintain  human 
operators. 

Intelligent  military  systems  will  reduce 
both  the  risk  to  human  personnel  and  the  cost  of 
training  and  readiness.  Unmanned  vehicles  and 
weapons  systems  put  fewer  humans  in  harms 
way  and  require  little  or  no  training  to  maintain 
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readiness.  Unmanned  systems  can  be  stored  in 
forward  bases  or  at  sea  for  long  periods  of  time  at 
low  cost.  They  can  be  mobilized  quickly  in  an 
emergency.  They  operate  without  fear  under  fire, 
the  first  time  and  every  time. 

Intelligent  systems  can  be  fast  and  effective 
in  gathering,  processing,  and  displaying 
information.  They  can  enable  human 
commanders  to  be  quicker  and  more  through  in 
planning  operations  and  in  replanning  for 
unexpected  events  as  the  battle  evolves. 

In  short,  intelligent  systems  promise  to 
multiply  the  capabilities  of  the  armed  forces, 
while  reducing  casualties  and  hostages  and 
lowering  the  cost  of  training  and  readiness. 

Important  for  Human  Well  Being 

Productivity  growth  depends  on  technology. 
Intelligent  systems  and  semiotics  have  entered  a 
new  phase  of  rapid  development  that  will  produce 
rapid  growth  in  productivity.  Intelligent 
machines  represent  a  new  breed  of  workers  that 
will  be  able  to  produce  more  and  higher  quality 
goods  and  services  at  lower  cost.  There  is  every 
reason  to  believe  that  we  are  on  the  cusp  of  an 
S-curve  where  productivity  can  grow 
exponentially  for  a  long  period  of  time.  Given 
adequate  investment,  there  is  no  reason  to  doubt 
that  productivity  growth  could  return  to  2.5%, 
which  is  the  average  for  the  20th  century.  Or,  it 
could  rise  to  4%  which  is  the  average  for  the 
1960-68  time  frame,  or  even  to  12%  which 
occurred  during  the  period  of  World  War  II 
between  1939  and  1945.  If  this  were  to  occur, 
many  of  the  problems  of  poverty,  disease,  and 
pollution  that  result  from  stagnant  economic 
growth  would  disappear.  Society  could  afford  to 
improve  health  care  and  education,  to  clean  up 
the  environment,  to  adopt  less  wasteful  forms  of 
production  and  consumption,  to  reduce  taxes,  and 
provide  a  minimum  income  for  all. 

Charting  the  Route 

At  this  point,  a  goal  destination  for  our 
road  map  has  been  established.  Our  goal  is  to 
understand  the  nature  of  mind  and  to  build 
intelligent  systems  that  will  improve  economic 
prosperity,  strengthen  military  security,  and 
improve  human  well  being.  I  believe  this  is  a 
worthy  and  desirable  goal. 


Let  me  now  turn  to  the  issue  of  finding  a 
route  from  where  we  are  today  to  the  goal 
destination. 

First,  let  me  say  that  we  have  already  made 
a  good  start  and  progress  is  rapid.  The  study  of 
intelligent  control  systems  is  an  extremely  active 
field.  Research  in  neural  nets,  fuzzy  logic,  and 
genetic  algorithms  is  showing  how  to  recognize 
patterns,  discover  control  laws,  and  control 
complex  industrial  processes.  Computer  science, 
artificial  intelligence,  semiotics,  and  linguistics 
are  probing  the  nature  of  language  and  image 
understanding.  Significant  progress  has  been 
made  in  expert  systems,  rule-based  reasoning, 
planning  and  problem  solving.  Game  theory  and 
operations  research  have  developed  methods  for 
decision-making  in  the  face  of  uncertainty. 
Research  in  sonar,  radar,  and  optical  signal 
processing  has  developed  methods  for  fusing 
sensory  input  from  multiple  sources,  and 
assessing  the  believability  of  noisy  data. 

Robotics  and  autonomous  vehicle  research 
has  produced  advances  in  real-time  sensory 
processing,  world  modeling,  navigation, 
trajectory  generation,  and  obstacle  avoidance. 
Research  in  computer  integrated  manufacturing 
and  process  control  has  produced  intelligent 
hierarchical  controls,  distributed  databases, 
representations  of  object  geometry  and  material 
properties,  data  driven  task  sequencing,  network 
communications,  and  multiprocessor  operating 
systems.  Modern  control  theory  has  developed  a 
precise  understanding  of  stability,  adaptability, 
and  controllability  under  various  conditions  of 
feedback  and  noise.  Intelligent  control  closes 
the  loop  between  sensing  and  acting  through 
perception,  world  modeling,  planning,  and 
control.  Intelligent  control  enables  large 
complex  systems  with  many  sensors  and 
actuators  to  pursue  and  achieve  sophisticated 
goals  in  an  uncertain,  competitive,  and 
sometimes  hostile  environment.  Intelligent 
control  enables  a  system  to  analyze  the  past,  to 
perceive  the  present,  and  to  plan  for  the  future. 
Intelligent  control  enables  systems  to  assess  the 
cost,  risk,  and  benefit  of  past  events  and  future 
plans  and  to  make  choices  between  alternative 
courses  of  action. 

In  the  neurosciences,  much  is  known,  both 
about  the  machinery  of  the  brain  and  the 
processes  that  run  in  it.  Neuroanatomy  has 
produced  extensive  maps  of  the  interconnecting 
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pathways  making  up  the  structure  of  the  brain. 
Neurophysiology  is  demonstrating  how  neurons 
compute  functions  and  communicate 
information.  Much  is  known  about  the 
functionality  of  specific  areas  of  the  brain.  For 
example,  the  areas  responsible  for  vision, 
hearing,  manipulation,  locomotion,  emotions, 
hunger,  thirst,  and  sexual  drive  are  well  explored. 
Within  the  vision  system,  there  are  modules  and 
pathways  that  specialize  in  analysis  of  motion, 
and  others  that  specialize  in  recognition  of  shape. 
The  brain  has  neuronal  mechanisms  for 
estimating  distance  to  visible  surfaces,  for 
tracking  objects,  anticipating  movement,  and  for 
fusing  information  derived  from  eyes  and  ears. 

Neuropharmacology  is  discovering  many  of 
the  transmitter  substances  that  activate  and 
modify  behavior,  that  enable  learning,  that 
dispense  reward  and  punishment,  and  that  assign 
value  to  perceived  objects  and  events. 

Psychophysics  provides  many  clues  as  to 
how  individuals  perceive  objects,  events,  time, 
and  space,  and  how  they  reason  about 
relationships  between  themselves  and  the  external 
world.  Neuropsychology  adds  information  about 
mental  development,  feeling,  emotions,  and 
behavior. 

Technology  is  Accelerating 

The  latter  half  of  the  20th  century  has  seen 
a  number  of  fundamental  breakthroughs  that 
have  radically  altered  the  landscape  of 
technological  possibilities.  Technological 
knowledge  is  not  just  increasing,  it  is 
accelerating.  There  is  an  exponential  growth  in 
technological  progress  that  shows  no  sign  of 
saturation.  For  example: 

1.  The  power  and  speed  (per  unit  cost)  of 
electronic  computing  has  risen  exponentially  by 
more  than  a  factor  of  ten  per  decade  for  almost 
fifty  years.  This  rate  of  progress  shows  no  sign 
of  slowing.  This  means  that  computing  power 
comparable  to  that  which  exists  in  the  brain 
could  be  assembled  in  the  foreseeable  future. 

2.  The  store  of  knowledge  about  how  the 
biological  brain  works  has  also  increased  many 
fold  over  the  last  half  century.  The  neurosciences 
show  no  sign  of  running  up  against  a 
fundamental  barrier  to  understanding. 


3.  The  number  of  scientists  and  engineers 
is  increasing  exponentially.  Over  half  of  all  the 
scientists  and  engineers  that  ever  lived  are  alive 
and  working  today. 

4.  Perhaps  of  most  importance,  the  nature 
of  knowledge  is  such  that  it  is  not  subject  to 
diminishing  returns.  The  more  that  is  known, 
the  easier  it  is  to  discover  new  things.  There  is 
no  limit  to  what  there  is  to  know.  There  is  only 
a  limit  to  the  amount  of  time  and  resources  we 
dedicate  to  the  pursuit  of  knowledge. 

So  what  is  lacking? 

The  main  thing  that  is  missing  is  a  widely 
accepted  theoretical  framework  that  ties  together 
all  of  the  diverse  fields  related  to  intelligent 
systems,  semiotics,  and  the  neurosciences.  There 
is  little  symbiosis  between  fields.  Things  are 
constantly  being  rediscovered  in  different  fields 
and  called  by  different  names. 

Also  missing  is  a  widely  accepted  reference 
model  architecture  that  can  support  an 
engineering  methodology  for  designing  and 
building  intelligent  systems.  Several 
architectures  have  been  proposed,  but  none  is 
widely  adopted.  There  are  no  standardized  metrics 
for  measuring  performance  that  can  scientifically 
evaluate  experiments  and  test  theories.  There  are 
no  interface  standards  so  that  intelligent  systems 
can  be  easily  assembled  from  component  parts. 
Finally,  there  is  no  educational  curriculum  for 
producing  scientists  and  engineers  skilled  in  the 
art  of  designing  and  building  intelligent  systems. 

An  Outline  for  a  Theory  of 
Intelligence 

As  a  result  of  our  research  in  the  Intelligent 
Systems  Division  here  at  NIST,  an  outline  for  a 
theory  of  intelligence  has  been  published.  (Albus 
91)  In  this  paper,  the  functional  elements  of 
intelligence  are  defined  as  behavior  generation, 
sensory  processing,  world  modeling,  and  value 
judgment.  Behavior  generation  provides  the 
functions  of  planning  and  control.  Sensory 
processing  filters  data  from  sensors,  detects 
events,  recognizes  patterns,  and  analyzes 
situations.  World  modeling  stores  and  maintains 
a  knowledge  database  and  uses  knowledge  about 
the  world  to  predict  the  future  and  simulate  the 
results  of  hypothesized  plans.    Value  judgment 
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functions  compute  cost,  benefit,  risk,  and 
uncertainty. 

These  functional  elements  are  supported  by 
a  distributed  knowledge  database  where 
knowledge  about  the  world  and  the  system  is 
represented  in  terms  of  state  variables,  event  and 
entity  frames,  attributes,  maps,  lists,  tasks, 
rules,  equations,  and  processes. 

The  functional  elements  and  knowledge 
database  are  integrated  by  a  system  architecture 


that  provides  communications,  timing,  and  an 
operating  system. 

Based  on  this  theory,  a  reference  model 
architecture  has  been  developed. 

A  diagram  of  the  relationship  between  the 
functional  elements,  and  the  knowledge  database 
in  this  reference  model  architecture  is  shown  in 
Figure  1. 


—  —  Perception    —  —  —  y  —  —  —  Behavior  Generation     —  —  —  —  | 
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Figure  1 .  Relationships  between  the  functional  elements  and  the  knowledge  database  within 
the  NIST  reference  model  architecture  for  intelligent  systems. 


Computational  modules  and  a  knowledge 
database  are  interconnected  by  communication 
pathways  that  transmit  information  so  as  to  close 
control  loops,  enable  perception  and  planning, 
and  facilitate  knowledge  acquisition  and  learning. 

In  the  intelligent  system,  values  and  cost 
functions  control  the  selection  of  goals  and 
optimization  of  behavior.  Hypothesized  plans  are 
communicated  from  Behavior  Generation  to 
World  Modeling  for  simulation  and  the  results  of 
simulation  to  Value  Judgment  for  evaluation. 
Evaluations  are  communicated  back  to  Behavior 
Generation  for  decision-making  in  selecting 
goals,  setting  priorities,  focusing  of  attention, 
organizing  perception,  and  controlling  behavior. 
World  Modeling  uses  information  in  the 
knowledge  database  to  predict  incoming  data. 


Sensory  processing  compares  predictions  with 
observations  and  computes  correlations  and 
differences.  Differences  are  returned  to  the  world 
to  update  the  knowledge  database.  This  is  a 
Kalman  filtering  loop  for  recursive  estimation. 
Correlations  between  predictions  and 
observations  signal  recognition  of  events  and 
objects.  Value  judgment  facilitates  learning  by 
computing  what  is  important  or  trivial,  and  what 
is  rewarding  or  punishing.  Real-time  data  in  the 
world  model  closes  a  control  loop  between 
sensory  feedback  and  task  execution.  Within  this 
system  the  deliberative  is  integrated  with  the 
reflexive,  enabling  a  real-time  intelligent  control 
system  to  be  both  goal  driven  and  sensory 
interactive. 

Figure  2  is  a  reference  model  architecture 
for  an  intelligent  shop  controller  within  a 
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manufacturing  enterprise.  Each  node  in  the 
hierarchical  architecture  shown  in  Figure  2 
consists  of  the  set  of  functional  elements  shown 
in  Figure  1. 

In  each  node  of  the  architecture  shown  in 
Figure  2: 

a  World  Modeling  function  maintains  the 
Knowledge  database,  answers  queries,  predicts 
sensory  input,  and  simulates  the  result  of 
hypothesized  plans; 

a  Sensory  Perception  function  scales  and 
filters  input  from  sensors,  detects  events,  and 
recognizes  objects  and  situations; 


a  Value  Judgment  function  computes  cost, 
risk,  and  benefit,  and  assigns  values  to  recognized 
objects  and  events  and  the  results  of  simulated 
plans; 

a  Behavior  Generation  function  uses  the 
information  provided  by  the  World  Modeling  and 
Value  Judgment  functions  to  find  the  best 
assignment  of  tools  and  resources  to  agents,  to 
find  the  best  plan  from  the  current  state  to  a  goal 
state,  and  to  execute  that  plan. 
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CELL    Plans  for  next  hour 
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Figure  2.  An  intelligent  system  architecture  for  manufacturing. 


Typically,  control  loops  are  closed  in  each 
node.  Each  layer  of  this  hierarchical  architecture 
has  a  characteristic  servo  loop  bandwidth  and  a 
characteristic  planning  horizon.  For  example,  at 
the  servo  level  the  feedback  loop  might  be  closed 
every  3  milliseconds  with  a  planning  horizon  of 
30  milliseconds.  At  the  primitive  level,  the 
feedback  loop  may  close  every  30  milliseconds 
with  a  planning  horizon  of  300  milliseconds.  At 
the  elemental  move  level  the  feedback  loop  is 
closed  every  300  milliseconds  with  a  planning 
horizon  of  3  seconds.  At  the  individual  machine 


level,  the  planning  horizon  may  be  30  seconds. 
At  the  workstation  level,  the  planning  horizon 
may  be  5  minutes,  and  at  the  cell  level  an  hour. 
At  the  shop  level,  the  planning  horizon  may 
extend  a  day  into  the  future. 

The  architecture  is  hierarchical  in  order  to 
deal  with  complexity.  The  hierarchical  structure 
enables  a  manufacturing  system  of  arbitrary 
complexity  to  be  decomposed  into  manageable 
units.  Each  level  of  the  hierarchy  has  a 
characteristic  range  and  resolution  in  time  and 
space.     Each  chain  of  command  within  the 
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hierarchy  generates  a  hierarchy  of  tasks  and  plans 
for  a  hierarchy  of  agents.  The  world  model  and 
knowledge  database  in  the  nodes  contain  a 
hierarchy  of  entities  and  events.  This  hierarchical 
decomposition  enables  the  development  of  an 
engineering  methodology  within  a  systematic 
framework  of  modular  components,  standard 
interfaces,  that  enable  designers  to  easily  add 
sensors  and  upgrade  processing  capabilities. 

The  intelligent  system  architecture  for 
manufacturing  shown  in  Figure  2  is  a  derivative 
of  the  Real-time  Control  System  (RCS)  reference 
model  architecture  for  intelligent  systems  that 
has  been  developed  at  NIST  and  elsewhere  over 
the  past  two  decades.  RCS  has  been  used  to 
develop  a  wide  variety  of  applications  including 
robots  for  machine  loading  and  unloading  using 
vision  and  tactile  sensors,  deburring  and 
chamfering  using  active  force  control,  integration 
of  an  Automated  Manufacturing  Research 
Facility  including  robots,  machine  tools,  a 
turning  machine,  a  coordinate  measuring 
machine,  and  automatic  guided  vehicles.  RCS 
also  has  been  used  to  control  multiple 
autonomous  undersea  vehicles,  a  space  station 
telerobotic  system,  a  simulated  nuclear 
submarine  operating  under  ice  during  a  transit  of 
the  Bearing  Straits.  Intelligent  controllers  have 
been  developed  for  coal  mine  automation,  post 
office  automation,  a  RoboCrane  system,  a  next 
generation  inspection  system,  and  an  Enhanced 
Machine  Controller  for  a  five  axis  machining 
center.  (Albus  97) 

Current  work  at  NIST  is  concentrating  on 
developing  an  systematic  engineering 
methodology  for  developing  intelligent  systems 
applications.  We  are  working  on  generic 
templates  and  software  development  tools.  We 
are  working  with  our  four  sister  divisions  in  the 
Manufacturing  Engineering  Laboratory  to 
develop  a  National  Advanced  Manufacturing 
Testbed  (NAMT)  which  will  perform  research  in 
virtual  and  distributed  manufacturing.  The 
NAMT  will  be  a  place  where  advanced  laboratory 
equipment  such  as  the  Hexapod  machine  tool,  the 
Next  Generation  Inspection  Systems,  and  the 
RoboCrane,  the  Advanced  Deburring  and 
Chamfering  System,  and  the  HMMWV 
Unmanned  Ground  Vehicle  can  be  accessed  and 
used  by  experimenters  at  remote  locations 
anywhere  in  the  world.  Through  the  NAMT 
facilities,  experimenters  will  also  be  able  to 
access  experimental  equipment  in  other  divisions 


at  NIST,  as  well  as  at  universities,  industry  labs, 
and  other  government  labs  such  as  Sandia  and 
Oak  Ridge. 

Our  Intelligent  Systems  Division  is 
planning  to  distribute  RCS  software  as  freeware 
to  universities  and  to  work  with  professors  to 
develop  lecture  material  and  lab  experiments  that 
can  be  used  for  course  work  in  intelligent 
control.  Through  the  NAMT  virtual  and 
distributed  testbed,  we  will  work  with  professors 
and  students  to  develop  algorithms  and  do 
experiments  and  perform  thesis  research  in 
intelligent  manufacturing,  construction,  and 
unmanned  vehicle  systems.  We  also  hope  to 
collaborate  with  the  National  Science  Foundation 
and  other  sources  of  funding  to  support  users 
groups,  promote  student  competitions,  and  fund 
collaborative  research. 

If  this  is  successful,  we  intend  to  solicit 
industrial  partners  to  propose  an  Advanced 
Technology  Program  targeted  thrust  area  for 
industry  research  in  intelligent  systems. 

Conclusion 

So  this  is  my  proposed  road  map  for  the 
future.  The  goal  of  the  roadmap  is  to  understand 
the  nature  of  mind  and  to  build  systems  that  will 
improve  economic  prosperity,  strengthen 
military  security,  and  improve  human  well 
being. 

The  suggested  route  is: 

(1)  to  develop  a  theory  and  reference  model 
architecture  for  intelligent  systems, 

(2)  to  develop  an  engineering  methodology 
and  educational  curriculum, 

(3)  to  distribute  freeware  and  provide  user 
support, 

(4)  to  build  a  virtual  and  distributed  testbed 
environment  so  that  researchers  in  many  different 
locations  can  work  together,  and 

(5)  to  solicit  partners  and  pursue  additional 
funding  to  advance  the  theory  and  practice  of 
intelligent  systems  and  semiotics. 

In  closing,  let  me  reiterate  that  intelligent 
systems  are  important  for  a  wide  variety  of 
scientific,  economic,  and  military  reasons.  The 
technology  of  intelligent  systems  is  moving 
rapidly,  but  there  are  some  gaps  that  impede 
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progress.  Some  suggestions  have  been  made  to 
fill  in  these  gaps  and  achieve  the  ultimate  goal 
of  human  well  being. 

As  a  final  thought,  I  want  to  suggest  that 
intelligent  systems  represent  a  fundamentally 
new  technology  that  will  create  a  new  industrial 
revolution.  The  impact  of  this  new  industrial 
revolution  will  at  least  equal,  and  may  far  exceed, 
that  produced  by  the  invention  of  the  steam 
engine  and  the  discovery  of  electricity.  The 
ultimate  result  could  be  a  breakthrough  in 
productivity  growth  that  would  bring  an  end  to 
poverty  and  introduce  a  golden  age  for  human 
kind. 
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Abstract 

Whereas  most  cognitive  approaches  in  the  study  of  lan- 
guage have  been  developing  hypotheses  concerning  the 
principles  of  knowing  and  understanding  natural  lan- 
guages (i.e.  competence)  without  bothering  too  much 
about  communicative  language  usages  in  realworld  sit- 
uations (i.e.  performance),  new  semiotic  approaches 
in  cognitive  computational  linguistics  explore  the  pro- 
cedures believed  to  underlie  processes  of  language  learn- 
ing and  understanding.  They  do  so  by  simulating  these 
capabilities  as  system  behaviour  under  recourse  to  mod- 
eled structures,  observable  in  very  large  samples  of  situ- 
ated natural  language  discourse  and  represented  in  vec- 
tor space  formats  via  numerically  specified  quantitative 
methods  of  dynamic  (re-) construction.  It  will  be  argued 
that  the  ecological  understanding  of  informational  sys- 
tems in  Computational  Semiotics  corresponds  well  to 
the  procedural  modeling  and  numerical  reconstruction 
of  processes  that  simulate  the  constitution  of  meanings 
and  the  interpretation  of  signs  fsemiosis^.  The  theories 
of  fuzzy  sets  [24]  and  possibility  distributions  [23]  to- 
gether with  their  derivatives  in  soft  computing  [25]  ap- 
pear to  be  promising  in  providing  suitable  formats  for 
computational  approaches  to  natural  language  process- 
ing without  the  obligation  neither  to  reject  nor  to  accept 
traditional  formal  and  modeltheoretic  concepts  or  on- 
tologies. Examples  from  fuzzy  linguistic  research  [10] 
[18]  will  be  given  to  illustrate  these  points. 

1    Cognitive  Information  Processing 

For  the  majority  of  researchers  in  knowledge  repre- 
sentation and  natural  language  semantics  the  common 
ground  and  widely  accepted  frame  for  their  modeling 
may  be  found  in  the  dualism  of  the  rationalistic  tradi- 
tion of  thought  as  exemplified  in  its  matter-mind  notion 
of  an  independent  (objective)  reality  and  some  (subjec- 
tive) conception  of  it. 
1.1  The  traditional  approach 
According  to  the  realistic  view,  the  meaning  of  any  por- 
tion of  language  material  (like  e.g.  discourse,  utterance, 
word(token),  morph,  phone,  etc.)  is  interpreted  as  be- 
ing an  instantiation  of  (or  partly  derivable  from)  cer- 
tain other  entities,  called  linguistic  categories  (like  e.g. 
text,  sentence,  word(type),  morpheme,  phoneme,  etc.), 

•presented  at  the  NIST/NSF/IEEE/DARPA  Interntl.  Multi- 
disciplinary  Conference  INTELLIGENT  SYSTEMS:  A  SEMIOTIC 
PERSPECTIVE,  Oct.  20-23,  1996,  Gaithersburg,  Md,  USA 


with  the  understanding  that  these  categories  structure 
natural  language  material  according  to  their  compo- 
sitional functions.  It  is  by  these  functions  that  lan- 
guage material  (strings  of  terms)  appear  to  be  com- 
posed of  linguistic  entities  (aggregates  of  categories)  to 
form  structures  and  it  is  also  by  these  functions  that 
the  quality  of  language  structures  (having  meaning)  is 
conceived  as  being  part  of  both,  the  physical  reality  of 
language  material  and  the  semantic  significance  of  lin- 
guistic signs.  Illustrating  this  twofold  membership  are 
the  graph-theoretical  formats  which  have  become  stan- 
dard representations  for  natural  language  meanings, 
both  as  reiationaJ  structure  and  as  re/erentiai  denota- 
tion. Thus,  relating  arc-and-node  configurations  with 
sign-and-term  labels  in  graphs  like  trees  and  nets  ap- 
pears to  be  but  another  aspect  of  the  traditional  mind- 
matter-duality  according  to  which  a  realm  of  meanings 
is  presupposed  very  much  like  the  assumption  of  the 
pre-given  structures  of  the  reai  world  related  by  signs. 
Accepting  this  duality  has  neither  allowed  to  explain 
where  the  structures  or  where  the  labels  come  from 
nor  how  their  mutual  relatedness  as  meanings  of  signs 
can  be  derived.  The  emergence  of  the  meaning  reia- 
tion,  therefore,  never  occurred  to  be  in  need  of  some 
explanatory  modeling  because  the  existence  of  signs, 
objects  and  meanings  were  taken  for  granted  and  hence 
seemed  to  be  out  of  all  scrutiny.  Under  this  presup- 
position, fundamental  semiotic  questions  of  semantics 
simply  did  not  come  up,  they  have  hardly  been  asked 
yet,  and  are  still  far  from  being  solved. 

1.2    Modeling  cognition 

Extending  an  earlier  attempt  [22]  to  classify  approaches 
in  cognitive  science,  we  may  roughly  discern  four1  types 
of  approaches  in  modeling  cognition: 

>  the  cognitive  approaches  presuppose  the  existence 
of  the  external  world,  structured  by  given  objects 
and  properties  and  the  existence  of  representations 
of  (fragments  of)  this  world  internal  to  the  system, 
so  that  the  cognitive  systems'  (observable)  behaviour 
of  action  and  reaction  may  be  modelled  by  processes 
operating  on  these  structures; 

>  the  associative  approach  is  described  as  a  dy- 
namic structuring  based  on  the  model  concept  of 
self-organization  with  cognitive  systems  constantly 

^here  were  only  the  first  three  of  these  four  approaches  dis- 
tinguished by  Varela/Thompson/Rosch  (1991). 


541 


adapting  to  changing  environmental  conditions  by 
modifying  their  internal  representation  of  them. 
Whereas  both  these  approaches  apparently  draw  on 
the  traditional  rationalistic  paradigm  of  mind-matter- 
duality — static  the  former,  dynamic  the  latter — in  pre- 
supposing the  externa]  world  structure  and  an  interna] 
representation  of  it,  the  third  and  fourth  category  do 
not: 

>  the  enactive  approaches  may  be  characterized  as  be- 
ing based  upon  the  notion  of  strcuturai  coupling. 
Instead  of  assuming  an  external  world  and  the  sys- 
tems' internal  representations  of  it,  some  unity  of 
structural  relatedness  is  considered  to  be  fundamen- 
tal of — and  the  (only)  condition  for — any  abstracted 
or  acquired  duality  in  notions  of  the  external  and  in- 
ternal, object  and  subject,  reality  and  its  experience; 

>  the  semiotic  approaches  focus  on  the  notion  of  semio' 
sis  and  may  be  characterized  by  the  process  of  en- 
actment too,  supplemented,  however,  by  the  rep- 
resentational impact.  It  is  considered  fundamen- 
tal to  the  distinction  of  e.g.  cognitive  processes 
from  their  structural  results  which — due  to  the  traces 
these  processes  leave  behind — may  emerge  in  some 
form  of  knowledge  whose  different  representational 
modes  comply  with  the  distinction  of  interna]  or  tacit 
knowledge  (i.e.  memory)  on  the  one  hand  and  of  ex- 
terna] or  declarative  knowledge  (i.e.  ianguage)  on 
the  other. 

According  to  these  types  of  cognitive  modeling,  com- 
putational semiotics  can  be  characterized  as  aiming 
at  the  dynamics  of  meaning  constitution  by  simulat- 
ing processes  of  multi-resolutional  representation  [5] 
within  the  frame  of  an  ecological  information  process- 
ing paradigm  [18]. 

As  we  take  human  beings  to  be  systems  whose 
knowledge  based  processing  of  represented  information 
makes  them  cognitive,  and  whose  sign  and  symbol  gen- 
eration, manipulation,  and  understanding  capabilities 
render  them  semiotic,  we  may  do  so  due  to  our  own 
daily  experience  of  these  systems'  outstanding  ability 
for  representing  results  of  cognitive  processes,  organize 
these  representations,  and  modify  them  according  to 
changing  conditions  and  states  of  system-environment 
adaptedness. 

2    Computational  Semiotics 

For  the  semiotic  approach  to  human  cognition  (consti- 
tuting computational  semiotics)  such  representations 
resulting  from  complex  semiotic  cognitive  information 
processing  may  be  found  in  any  natural  language  dis- 
course. In  an  aggregated  form  of  pragmatically  ho- 
mogeneous text  corpora  [7]  communicatively  performa- 
tive natural  language  discourse  provides  a  cognitively 
highly  interesting  and  empirically  accessible  material 
whose  extreme  structuredness  may  serve  as  a  guideline 
for  the  cognitively  motivated,  empirically  based,  and 
computationally  realized  research  in  the  semiotics  of 
language,  too. 

Following  this  line,  however,  will  necessitate  to  pass 


on  from  traditional  approaches  in  competence  ori- 
ented linguistics  analysing  introspectively  the  propo- 
sitional  contents  of  singular  sentences  as  conceived  by 
ideal  speakers/ writers  towards  a  new  understanding  of 
meaning  constitution  as  a  dynamic  process  based  upon 
the  semiotic  cognitive  information  processing  the  traces 
of  which  are  to  be  identified  and  systematically  recon- 
structed on  the  basis  of  empirically  well  founded  obser- 
vation and  rigorous  mathematical  description  of  univer- 
sal regularities  that  structure  and  constitute  different 
levels  of  representations  in  masses  of  pragmatically  ho- 
mogeneous texts  produced  by  reaJ  speakers/ writers  in 
actual  situations  of  either  performed  or  intended  com- 
municative interaction.  Only  such  a  performance  ori- 
ented semiotic  approach  will  give  a  chance  to  formally 
reconstruct  and  model  procedurally  both,  the  signifi- 
cance of  entities  and  the  meanings  of  signs  as  a  function 
of  a  first  and  second  order  semiotic  embedding  relation 
of  situations  (or  contexts)  and  of  language  games  (or 
cotexts)  which  corresponds  to  the  two-level  actualisa- 
tion  of  cognitive  processes  in  language  understanding 
[18]. 

2.1    Ecological  information  systems 

Life  may  be  understood  as  the  ability  to  survive  by 
adapting  to  changing  requirements  in  the  real  world. 
In  terms  of  the  theory  of  information  systems,  fac- 
ulties like  perception,  identification,  and  interpreta- 
tion of  structures  (external  or  internal  to  a  system) 
may  be  conceived  as  a  form  of  information  process- 
ing which  (natural  or  artificial)  systems — due  to  their 
own  structeredness — are  able  to  perform.  Thus,  living 
systems  receive  or  derive  information  only  from  rele- 
vant portions  of  their  surrounding  environments,  they 
learn  from  experience,  and  change  their  behaviour  ac- 
cordingly. In  contrast  to  other  living  systems  which 
transmit  experiential  results  of  environmental  adapta- 
tion only  biogenetically2  to  their  descendants,  human 
information  processing  systems  have  additional  means 
to  convey  their  knowledge  to  others.  In  addition  to 
the  vertical  transmission  of  system  specific  (intraneous) 
experience  through  (biogenetically  successive)  genera- 
tions, mankind  has  complementally  developed  horizon- 
tal means  of  mediating  specific  and  foreign  (extrane- 
ous) experience  and  knowledge  to  (biogenetically  unre- 
lated) fellow  systems  within  their  own  or  any  later  gen- 
eration. This  is  made  possible  by  a  semiotic  move  that 
allows  not  only  to  distinguish  processes  from  results  of 
experience  but  also  to  convert  the  latter  to  knowledge 
facilitating  it  to  be  re-used,  modified  and  improved  in 
learning.  Vehicle  and  medium  of  this  move  are  repre- 
sentations, i.e.  complex  sign  systems  which  constitute 
languages  and  form  structures,  called  texts  which  may 
be  realized  in  communicative  processes,  called  acfcuaii- 
sation. 

2  According  to  standard  theory  there  is  no  direct  genetic  cod- 
ing of  experiential  results  but  rather  indirect  transmission  of 
them  by  selectional  advantages  which  organisms  with  certain  ge- 
netic mutations  gain  over  others  without  them  to  survive  under 
changing  environmental  conditions. 
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In  terms  of  the  theory  of  information  systems, 
texts — whether  internal  or  external  to  the  systems — 
function  like  virtual  environments3. 

2.2    Modes  of  Representation 

Considering  the  system-environment  relation,  virtual- 
ity  may  be  characterized  by  the  fact  that  it  dispenses 
with  the  identity  of  space-time  coordinates  for  system- 
environment  pairs  which  normally  prevails  for  this  re- 
lation when  qualified  to  be  indexed  real.  It  appears, 
that  this  dispensation  of  identity — for  short:  space- 
time-dispensation — is  not  only  conditional  for  the  pos- 
sible distinction  of  (mutually  and  relatively  indepen- 
dent) systems  from  their  environments,  but  establishes 
also  the  notion  of  representation.  Accordingly,  imme- 
diate or  space-time-identical  system-environments  ex- 
isting in  their  space-time-identity  may  well  be  distin- 
guished from  mediate  or  space-time-dispensed  system- 
environments  whose  particular  representational  form 
(texts)  corresponds  to  their  particular  status  both,  as 
language  material  (being  signs),  and  as  language  struc- 
ture (having  meaning').  This  double  identity  calls  for  a 
particular  modus  of  actualisation  (understanding)  that 
may  be  characterized  as  follows: 
For  systems  appropriately  adapted  and  tuned  to  such 
environments  actuah'sation  consists  essentially  in  a 
twofold  embedding  to  realize 

>  the  spaciotemporal  identity  of  pairs  of  immediate 
system-environment  coordinates  which  will  let  the 
system  experience  the  material  properties  of  texts 
as  signs  (i.e.  by  functions  of  physical  access  and 
mutually  homomorphic  appearance).  These  prop- 
erties apply  to  the  percepts  of  language  structures 
presented  to  a  system  in  a  particular  discourse  situ- 
ation, and 

>  the  representational  identity  of  pairs  of  mediate 
system-environment  parameters  which  will  let  the 
system  experience  the  semantic  properties  of  texts 
as  meanings  (i.e.  by  functions  of  emergence,  identi- 
fication, organisation,  representation  of  structures). 
These  apply  to  the  comprehension  of  language  struc- 
tures recognized  by  a  system  to  form  the  described 
situation 

Hence,  according  to  the  theory  of  information  sys- 
tems, functions  like  interpreting  signs  and  understand- 
ing meanings  translate  to  processes  which  extend  the 
fragments  of  reality  accesssible  to  a  living  (natural  and 
possibly  artificial)  information  processing  system.  This 

3Simon's  (1982)  remark  "There  is  a  certain  arbitrariness  in 
drawing  the  boundary  between  inner  and  outer  environments  of 
artificial  systems.  . . .  Long-term  memory  operates  like  a  second 
environment,  parallel  to  the  environment  sensed  through  eyes 
and  ears"  (pp.  104)  is  not  a  case  in  point  here.  Primarily  con- 
cerned with  where  to  place  the  boundary,  he  does  not  seem  to 
see  its  placing  in  need  to  be  justified  or  derived  as  a  consequence 
of  some  possibly  representational  processes  we  call  semiotic.  As 
will  become  clear  in  what  follows,  Simon's  distinction  of  inner 
(memory  structure)  and  outer  (world  structure)  environments  is 
not  concerned  with  the  special  quality  of  language  signs  whose 
twofold  environmental  embedding  (textual  structure)  cuts  ac- 
cross  that  distinction,  resolving  both  in  becoming  representa- 
tional for  each  other. 


extension  applies  to  both,  the  immediate  and  mediate 
relations  a  system  may  establish  according  to  its  own 
evolved  adaptedness  or  dispositions  (i.e.  innate  and 
acquired  structuredness,  processing  capabilities,  repre- 
sented JcnowJedge). 

2.3    Semiotic  enactment 

Semiotic  systems'  ability  to  actualize  environmental 
representations  does  not  merely  add  to  the  amount  of 
experiential  results  available,  but  constitutes  also  a  sig- 
nificant change  in  experiencial  modus.  This  change  is 
characterized  by  the  fact  that  only  now  the  processes 
of  experiencing  may  be  realized  as  being  different  and 
hence  be  separable  from  the  results  of  experience  which 
in  immediate  system-environments  appear  to  be  in- 
distinguishable. Splitting  up  experience  in  experien- 
cial processes  and  experiencial  results — the  latter  being 
representational  and  in  need  for  procedural  actualisa- 
tion by  the  former — is  tantamount  to  the  emergence  of 
virtual  experiences  which  have  not  to  be  made  but  can 
instead  just  be  tried,  very  much  like  hypotheses  in  an 
experimental  setting  of  a  testbed.  These  results — like 
in  immediate  system-environments — may  become  part 
of  a  system's  adaptive  knowledge  but  may  also — other 
than  in  immediate  system-environments — be  neglected 
or  tested,  accepted  or  dismissed,  repeatedly  actualized 
and  re-used  without  any  risk  for  the  system's  own  sur- 
vival, stability  or  adaptedness. 

This  in  a  way  experimental  quality  of  textual  rep- 
resentations which  increases  the  potentials  of  adaptive 
information  processing  beyond  the  system's  lifespan,  is 
constrained  simultaneously  by  dynamic  structures  cor- 
responding to  knowledge.  The  built-up,  employment, 
and  modification  of  these  structural  constraints4  is  con- 
trolled by  procedures  whose  processes  determine  cog- 
nition and  whose  results  constitute  adaptation.  Sys- 
tems properly  attuned  to  textual  system-environments 
have  acquired  these  structural  constraints  (language 
learning)  and  can  perform  certain  operations  efficiently 
on  them  (language  understanding).  These  are  prereq- 
uisites to  recognizing  mediate  (textual)  environments 
and  to  identify  their  need  for  and  the  systems'  own 
ability  to  actualize  the  mutual  (and  trifold)  related- 
ness  constituting  what  Peirce  called  semiosis5.  Sys- 
tems capable  of  and  tuned  to  such  knowledge-based 
processes  of  actualisation  will  in  the  sequel  be  referred 
to  as  semiotic  cognitive  information  processing  systems 
(SCIPS)  [17,  19]. 

Representation,  therefore,  has  to  be  considered  fun- 
damental to  the  distinction  of  the  processes  of  cogni- 
tion from  their  results  which  may  emerge — due  to  the 
traces  these  processes  leave  behind — in  some  structure 
(knowledge).  Different  representational  modes  of  this 

■*What  Simon  (1982)  calls  memory  in  accordance  with  his 
questioning  of  the  inner-outer-distiction  of  cognitive  systems  and 
their  environments. 

8 By  semiosis  I  mean  [. . .]  an  action,  or  influence,  which  is,  or 
involves,  a  cooperation  of  three  subjects,  such  as  sign,  its  object, 
and  its  interpretant,  this  tri-relative  influence  not  being  in  any 
way  resolvable  into  actions  between  pairs.  (Peirce  1906,  p.282) 
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structure  not  only  comply  with  the  distinction  of  in- 
terna] or  tacit  knowledge  (i.e.  memory)  on  the  one 
hand  and  of  externa/  or  declarative  knowledge  (i.e. 
texts)  on  the  other6,  but  these  modes  also  relate  to  dif- 
ferent types  of  formats  (distributional  vs.  symbolic), 
modeling  (connectionist  vs.  ruie- based)  and  processing 
(stochastic  vs.  deterministic).  It  is  this  range  of  cor- 
respondences that  Fuzzy  Linguistics  is  based  upon  and 
tries  to  exploit  to  come  up  with  a  unifying  framework 
for  most  of  the  different  approaches  followed  sofar. 
Soft  categorising  appears  to  be  a  prerequisite  for  fuzzy 
linguistic  modeling,  examples  of  which  will  illustrate 
the  notion  of  dynamic  structures  emerging  from  cor- 
pora of  natural  language  discourse  when  processed. 

3    Fuzzy  Linguistic  Modelling 

It  is  far  from  certain  yet  whether — and  if  so,  how — 
semiotic  models  will  help  to  understand  how  struc- 
tures may  emerge  from  orders  of  some  kind  and  how 
these  orders  evolve  from  regularities  which  multitudes 
of  repeatedly  observable  entities  show.  Recent  re- 
search findings,  however,  give  rise  to  expect  that  pro- 
cesses which  determine  regularities  and  assemble  them 
to  form  (intermediate)  representations  whose  proper- 
ties resemble  (or  can  even  be  identified  with)  those 
of  observable  entities  may  indeed  be  responsible  for 
(if  not  identical  with)  the  emergence  and  usage  of 
sign-functional  structures  in  language  understanding 
systems,  both  natural  and  artificial.  As  more  ab- 
stract (theoretical)  levels  of  representation  for  these 
processes  — other  than  their  procedural  modeling — are 
not  (yet)  to  be  assumed,  and  as  any  (formal)  means 
of  deriving  their  possible  results — other  than  by  their 
(operational)  enactment — are  (still)  lacking,  it  has  to 
be  postulated  that  these  processes — independent  of 
all  other  explanatory  paradigms — will  not  only  relate 
but  produce  different  representational  levels  in  a  way 
that  is  formally  controlled  or  computable,  that  can 
be  modeled  procedurally  or  algoritbmized,  and  that 
may  empirically  be  tested  or  implemented  [4].  Pro- 
cedural models  are  understood  to  denote  a  class  of 
(re-)  presentational  or  modeled  (re-)constructions  of  en- 
tities whose  interpretation  is  not  (yet)  tied  to  an  un- 
derlying theory  which  would  provide  the  semantics  for 
the  entities  (or  expressions)  that  these  type  of  mod- 
els present.  Instantiating  their  defining  procedures  as 
implemented  algorithms  will  result  in  processes  which 
produce  some  observable  structures  that  can  only  then 
be  compared  to  those  of  the  modeled  original. 

As  some  of  these  procedural  characteristics  have 

6  Whereas  tacit  knowledge  cannot  be  represented  other  than 
by  the  immediate  system-environments'  corresponding  states, 
explicit  knowledge  is  bound  to  acquire  some  formal  properties 
in  order  to  become  externally  presented  and  thereby  part  of  me- 
diate system-environments.  Natural  languages  obviously  pro- 
vide these  formal  properties — as  partly  identified  by  research 
in  linguistic  competence  (principles  knowledge  and  acquisition 
of  language) — whose  enactment — as  investigated  in  studies  on 
natural  language  performance  (production  and  understanding  of 
texts) — draws  cognitively  on  both  bases  of  (explicit  and  tacit) 
knowledge. 


also  be  claimed  by  cognitive  linguistic  approaches  and 
computational  models  of  language  understanding,  their 
main  traits  may  help  to  illustrate  the  different  positions 
of  semiotic  modeling  in  fuzzy  linguistics. 

3.1    Cognitive  linguistic  strata 

Cognitive  theory  has  long  identified  the  complex  of  lan- 
guage understanding  to  be  a  modular  system  of  sub- 
systems of  information  processing.  The  idea  of  sym- 
bolic representation  and  the  computer  metaphor  of- 
fered a  frame  for  modeling  cognitive  processes,  for- 
mally grounded  on  logical  calculi  and  procedurally 
realized  in  algorithms  operating  on  representational 
structures.  Following  and  partly  supplementing  strata 
of  semiotic  description  and  analysis  of  signs,  differ- 
ent levels  of  modular  aggregation  of  external  and  in- 
ternal information  have  been  distinguished  in  cogni- 
tive models  of  language  understanding.  These  partly 
correspond  to  and  partly  cut  accross  the  syntactics- 
semantics-pragmatics  distinction  in  the  semiotic  relat- 
edness  of  signs,  the  utterance-discourse-corpus  levels 
of  performative  language  processing,  and  the  hierarchy 
of  morpbo-phonological,  syntax-sentencial  and  lexico- 
semantic  descriptions  in  traditional  models  of  struc- 
tural linguistics. 

In  one  of  the  rare  ventures  on  discussing  of  how 
cognitive,  i.e.  knowledge  based  information  processing 
mechanisms  may  be  provided  with  the  knowledge  bases 
they  are  meant  to  operate  on,  and  how  these  knowledge 
structures  may  be  related  to  observable  language  data, 
BlERWlSCH  (1981)  sketches  a  hierarchy  of  information 
processing  mechanisms  whose  representational  format 
(sets  of  rewrite  rules  operating  on  structured  data)  al- 
lows algorithms  be  formulated  and  implementations  be 
found  to  guarantee  their  computability.  According  to 
this  schema  (Fig.  1)  and  starting  with  the  morpho- 
phonological  level,  an  information  processing  mecha- 
nism Mi  is  postulated  which  receives  utterances  as  in- 
put and  produces  some  associated  structures  as  output. 
In  doing  so,  however,  the  mechanism's  performance  will 
be  determined  not  only  by  the  external  input  strings 
but  also  by  some  internal  knowledge  of  elements  and 
rules  which  allow  to  agglomerate  the  structures  identi- 
fied. The  acquisition  and  representation  of  this  internal 
knowledge  is  hypothesized  as  resulting  from  a  process 
Mi  which  also  includes  a  multitude  of  rudimentary,  in- 
complete, and  tentative  Mi-kind  processes.  M%  is  as- 
sumed to  be  a  complex  information  processing  mecha- 
nism whose  inputs  are  corpora  of  utterances  together 
with  some  environmental  information,  and  whose  out- 
puts will  be  the  grammars  underying  these  utterances. 
Again,  this  mechanism's  results  will  not  only  and  com- 
pletely be  determined  by  the  external  inputs  but  also 
by  some  internal  structures  which  are  believed  to  con- 
trol the  human  language  faculty  in  a  fundamental  way 
as  so-called  linguistic  universals.  These  may  (or  may 
not)  be  assumed  to  be  derivable  as  results  of  an  in- 
formation processing  mechanism  M3  whose  input  is  as 
comprehensive  (or  unspecified)  as  the  term  Janguages 
might  allow. 
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Figure  1  Figure  2 

Schemata  of  model  hierarchy  of  cognitiv  linguistic  strata  of  mechanisms  (Bierwisch)  as  compared  to  model  tiling 
of  computational  semiotic  coverage  of  procedures  (Rieger)  for  the  analysis  and  representation  of  (abstracted  and 
observable)  language  phenomena. 


Taking  the  relation  of  inclusion  for  Mi  C  Mi  to  hold 
also  for  Mi  C  M3,  and  considering  Mi,  M2,  M3  compu- 
tationally specifyable  procedures  of  language  analysing 
processes  instead  of  mere  metaphors  for  some  (more  or 
less  plausible)  mechanisms  of  the  human  mind,  then 
it  appears  reasonable  to  consider  M3  a  collection  of 
all  the  processes  of  methodical  analysis,  representation, 
and  comparison  of  structured  sets  of  utterances  from 
different  languages,  including  the  processes  in  Mi  as  a 
device  that  explicidly  specifies  an  utterance's  structure 
relative  to  a  given  grammar,  and  the  processes  in  M2 
as  a  system  that  generates  a  grammar  from  a  corpus  of 
utterances  relative  to  the  given  set  of  universals.  This 
modeling  view  allows  for  the  notions  of  Universals  =>■ 
Grammar  Structure  to  be  understood  as  variables 
of  theoretical  constructs  hinged  on  empirical  regulari- 
ties observed  in  Languages,  Corpus,  Utterance  respec- 
tively. Whereas  the  latter  are  external  representations, 
the  former  are  internal  to  any  SCJP-system  and  con- 
sidered external  representations  only  under  the  com- 
petence linguistic  approach  to  cognitive  modeling.  As 
such  they  are  hypothesized  to  form  a  hierarchy  of  lin- 
guistic— not  language — entities  which  formally  specify 
a  class  of  other  linguistic  entities  (following  the  double 
arrows  in  Fig.  1). 

The  model  theoretical  and  operational  problems  in- 
herent in  this  setup  concern  the  (non  universal  and 
highly  restrictive)  representational  format  which  is  as- 
sumed to  enable  the  denotation  of  universais,  gram- 
mar and  structure,  and  the  essentially  top-down,  non- 
recursive  propagation  of  externally  presented  but  in- 
ternally processed  results  of  these  mechanisms.  Thus, 
M3  whose  performance  in  identifying  universals  and 
representing  them  externally  depends  crucially  on  the 
efficient  performance  of  M2  which  is  said  to  employ 
these  universals  as  internal  procedural  constraints  in 
order  to  identify  syntactic  regularities  and  represent 
them  externally  in  a  rule  based  format  as  grammars. 
Grammars,  in  turn,  have  to  be  employed  as  internal 
procedural  constraints  by  Mi  if  this  mechanism's  iden- 
tification processes  and  the  external  representation  of 
their  findings  is  meant  to  be  successful. 

Distinguishing  between  these  two  kinds  of  structures 


either  external  or  internal  to  the  mechanisms  M  in- 
troduced so  far,  is  indicative  of  the  systems  theoreti- 
cal view  proposed  in  semiotic  modeling.  It  easily  al- 
lows to  translate  these  mechanisms  as  sets  of  proce- 
dures which  allow  to  describe  and  simulate  a  living 
systems'  abilility  to  process  environmental  input  (ex- 
ternal structures)  according  to  procedural  constraints 
known  to  the  system  (internal  structures)  in  order  to 
produce  some  results  of  this  processing.  However,  it 
appears  not  at  all  conclusively  compelling  to  assume 
that  these  procedural  constraints  and  the  processing 
results  need  to  be  represented  in  a  rule-based  format. 
According  to  an  ecologically  motivated  systems  the- 
oretical view,  systems  enacting  these  processes  under 
boundary  conditions  as  determined  by  their  surround- 
ing environments,  or  their  internal  structuredness,  or 
both,  will  have  to  process  certain  inputs  to  produce 
specified  output  structures.  But  identifying  their  sta- 
tus of  being  at  the  same  time  internal  and  external  to 
the  processing  system  is  tantamount  to  the  method- 
ological dilemma  which  can  solely  be  solved  on  the 
grounds  of  revising  the  representational  mode  and  the 
formatting  constraints  which  the  model  construction 
has  to  be  decided  on  to  allow. 

Following  Chomsky  these  modes  have  been  re- 
stricted to  abstract  principles  of  language  competence 
by  processes  [2]  whose  assumed  rule-based  determinacy 
consequently  led  to  formal  representations  of  these 
rules  giving  rise  to  the  above  model  hierarchy  of  dis- 
crete strata  [1].  In  trying  to  relate  these  strata  to  ob- 
servable performative  language  data  structures  in  or- 
der to  mediate  observable  language  regularities  with 
theoretical  constructs  supposedly  representing  princi- 
ples underlying  these  constructs,  the  methodological 
shortcomings  of  the  cognitive  linguistic  approach  are 
revealed.  It  suffers  from  competence  theoretically  in- 
spired idealisations  of  regularities  and  theoretical  ab- 
stractions (like  universaJs,  grammars,  sentences)  whose 
symbolic  notations  and  formal  expressions  may  be  scru- 
tinized for  their  syntactic  correctness  but  lack  empir- 
ically observable  and  experimentally  testable  proce- 
dures of  language  representation  which  are  indepen- 
dant  from  competent  speakers'  understanding  of  that 
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language. 

3.2    Fuzzy  linguistic  tiling 

Other  than  cognitive  linguistic  and  competence  the- 
oretical mechanisms,  we  propose  cognitive  semiotic  and 
language  performative  procedures  which  redefine  the 
modularity  of  language  understanding  as  an  overlap- 
ping covering  of  computational  processes.  The  classi- 
cal and  coarse  three-stage  description  and  modeling  of 
linguistic  regularities  will  be  replaced  or  rather  com- 
plemented [11]  by  a  multi-stage  covering  of  semiotic 
procedures  Mn-i  to  Mn+k  (Fig.  2)  which  allows  for 
the  definition  of  more  adequate  soft  categories  or  in- 
termediate representations  to  fit  regularities  of  entities 
on  any  level.  Their  essentially  cognitive  character  will 
not  be  borrowed  from  any  predefined  strata  (and  their 
puportedly  related  abstract  categories)  but  is  to  be  de- 
rived as  a  result  of  their  performance,  i.e.  the  ability  to 
transform  linearly  structured  entities  (strings)  of  one 
level  to  multi-dimensional  structures  of  entities  (vec- 
tors) on  another. 

This  is  achieved  by  analysing  the  linear  or  syntag- 
matic  and  selective  or  paradigmatic  constraints  which 
natural  language  structure  imposes  on  the  formation  of 
(strings  of)  linguistic  entities  on  whatever  level  of  en- 
tity formation.  It  has  been  shown  and  illustrated  else- 
where [15],  [9]  in  some  detail,  that  fuzzy  linguistic  mod- 
eling allows  to  derive  the  representational  means  (e.g. 
soft  categories,  continuous  gradation,  variable  granu- 
larity, flexible  plasticity,  dynamic  approximation,  etc.) 
which  crisp  categories  and  competence  theoretically  in- 
spired idealisations  of  performative  regularities  lack. 
The  (numerical)  specificity  and  (procedural)  definite- 
ness  of  sub-symbolic,  distributed  formats  in  entity  for- 
mation appear  to  provide  for  higher  phenomenological 
compatibility  and  more  cognitive  adequacy  than  tra- 
ditional levels  of  categorial  representation  whose  sym- 
bolic mediation  and  syntactic  correctness  could  only 
formally  be  scrutinized  but  not  empirically  or  experi- 
mentally be  tested  [11]. 

4    Procedural  (Re-)  Construction 

The  success  of  computational  language  analysis  and 
generation  is  based  upon  adequate  structural  descrip- 
tions of  input  strings  and  their  semantic  interpreta- 
tions. This  was  assumed  to  be  made  possible  by  the 
correctness  of  rule  based  representations  of  (syntactic 
and  lexical)  knowledge  of  language  and  of  (referential 
and  situative)  world  knowledge  on  which  grammar  for- 
malisms and  deductive  inferential  mechanisms  can  op- 
erate on.  Although  the  essentially  static  representa- 
tions of  structures  in  this  kind  of  cognitive  modeling  of 
language  processing  (based  on  monotonic  logics,  sym- 
bolic representations,  rule-based  operations,  serial  pro- 
cessing, etc.),  has  produced  considerable  advances  in 
formal  theory  and  the  consistent  development  of  in- 
creasingly more  complex  systems,  their  idea  of  repre- 
senting language  entities  as  essentially  crisp  categorial 
type  linguistic  entities  proves  to  become  increasingly 
problematic.  As  the  processing  of  very  large  language 


corpora  (VLLC)7,  has  made  clear,  traditional  linguistic 
categories  do  not  reduce  but  increase  model  complex- 
ity when  applied  to  regularities  and  structures  which 
quantitative-numerical  means  may  easily  identify  and 
represent.  Trying  to  map  such  sub-rule  regularities  and 
sub-symbolic  structures  to  inadequate  categories  will 
generally  result  in  a  large  number  of  borderline  cases, 
variations,  and  ambiguities  which  then  have  to  be  dealt 
with,  but  could  possibly  more  easily  be  avoided  from 
the  very  start. 

4.1    Exploiting  constraints 

Structural  linguistics  has  given  substantial  hints8  on 
how  language  items  come  about  to  be  employed  in 
communicative  discourse  the  way  they  are.  They  have 
identified  the  fundamental  constraints  that  control  the 
multi-level  combinability  and  formation  of  language  en- 
tities by  distinguishing  the  restrictions  on  linear  ag- 
gregation of  elements  (syntagmatics)  from  restrictions 
on  their  selective  replacement  (paradigmatics).  This 
distinction  allows  within  any  sufficiently  large  set  of 
strings  of  natural  language  discourse  to  ascertain  syn- 
tagmatic  regularities  of  element  aggregations  on  level 
n  whose  characteristic  patterns  form  paradigmatic  reg- 
ularities tantamount  to  their  aggregational  status  on 
level  n  +  1.  As  has  been  illustrated  above,  the  dis- 
tinction of  representational  levels  is  tantamount  to  the 
categorial  constraints  applied  when  identifying  regular- 
ities. Fully  deterministic  if-then  rules  will  result  in  a 
rather  coarse  three-level  hierarchy  of  categorial  descrip- 
tion whereas  probabilistic  or  possibilistic  dependency 
produces  a  continuous,  multi-level  covering  of  distribu- 
tional representations.  Thus,  it  can  be  distinguished 
sharply  between  cognitive  linguistic  and  semiotic  pro- 
cedures the  computations  of  the  latter  transform  struc- 
tured input  data  according  to  its  immanent  regulari- 
ties to  yield  new,  structural  representations  emerging 
from  that  computation  (as  hypothesized  by  performa- 
tive linguistics  and  realized  in  procedural  models  of 
computationaJ  semiotics).  The  elements  of  these  new 
structures  are  value  distributions  or  vectors  of  input 
entities  that  depict  properties  of  their  structural  relat- 
edness,  constituting  multi-dimensional  (metric)  space 
structures  (semiotic  spaces).  The  elements  may  also 

7The  Trier  dpa-Corpus  for  instance  comprises  the  com- 
plete textual  materials  from  the  so-called  basic  news  real  ser- 
vice of  1990-1993  (720.000  documents)  which  the  Deutschen 
Presseagentur  (dpa),  Hamburg,  deserves  thanks  to  have  the  au- 
thor provided  with  for  research  purposes.  After  deletion  of  edit- 
ing commands  the  TVier-dpa-Corpus  consists  of  approx.  180  Mio. 
(18- 107)  running  words  (tokens)  for  which  an  automatic  tagging 
and  lemmatising  tool  is  under  development.  It  is  this  corpus 
which  provides  the  performative  data  of  written  language  use 
for  the  current  (and  planned)  fuzzy-linguistic  projects  at  the  our 
department. 

8In  subscribing  to  a  structuralistic  view  of  natural  languages, 
the  distinction  of  langue-parole  and  competence-performance  in 
modern  linguistics  allowes  for  different  levels  of  language  descrip- 
tion and  linguistic  analysis.  Being  able  to  segment  strings  of 
language  discourse  and  to  categorize  types  of  linguistic  entities, 
however,  is  but  making  analytical  use  of  structural  couplings  pre- 
sented by  natural  language  discourse  to  semiotic  systems  prop- 
erly attuned. 
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ox 

OX 
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xuu,uuu 

ox 

i  nn  nnn 

o 

it 

SI  7 

Oil 

OOjUlu 

Qfil 

X 

s^  ni  ^ 

3 

10.175 

29.791 

34,154 

25.327 

40,174 

4 

54.470 

923.521 

5,898 

315.425 

17^268 

5 

164.045 

28.629.151 

0,572 

1.688.570 

9,715 

6 

357.632 

887.503.681 

0,040 

5.085.395 

7,032 

7 

634.767 

27.512.614.111 

0,002 

11.086.592 

5,725 

Size  of  test-corpus  : 

3.648.326 

(signs) 

502.587 

(words) 

Table  1:  Graph-(letter-)combinatorics  with  (theoretically  and  faktually)  possible  and  actually  occurring  types  of 
n-grams  in  a  subset  of  the  Trier  dpa-Korpus 


be  interpreted  as  fuzzy  sets  allowing  set  theoretical  op- 
erations be  exercised  on  them  that  do  not  require  cat- 
egorial  type  (crisp)  definitions  of  concept  formations. 
Computation  of  letter  (morphic)  vectors  in  word  space, 
derived  from  n-grams  of  letters  (graphemes)  [10]  [11]  as 
well  as  of  word  (semic)  vectors  in  semantic  space  [12] 
[13],  derived  from  wordtype  correlations  of  their  tokens 
in  discourse  will  serve  to  illustrate  the  operational  flexi- 
bility and  fine  granularity  of  vector  notations  [9],  [16]  to 
identify  regularities  of  semiotic  meaning  constitution  in 
language  performance  which  traditional  linguistic  cat- 
egories fail  to  represent. 
4.2    The  word  space 

The  following  notations  will  be  used  to  outline  the  com- 
putational semiotic  approach  on  the  morphic  level: 

n-grams  are  n-elementary  strings  of  entities.  For  n  > 
2  they  may  be  analysed  as  ordered  pairs  of  adja- 
cent items  (letters,  graphs,  sign-strings,  word-strings, 
etc.)  which  are  the  basis  of 

abstractions  over  such  items  procedurally  be  deter- 
mined as  soft  categorial  types  (corresponding  to  char- 
acters, graphems,  morphems,  syllables,  words,  etc.). 
These  have  been  introduced  as  dispositional  depen- 
dency structures  (DDS)  [14]  [8]  and  formally  declared 
as  structured 

fuzzy  (sub-)sets  of  multi-dimensional  sign  inventories 
Zn  with  n  >  1 

Xn  :=  {(*,  /*„(*»:  x  G  Zn)  C  Zn  x  [0, 1]  (1) 

whose  elements'  grades  of  membership  are  defined  by 
the  membership-function 


with  Z  =  {zi, . . . ,  zm}  will  yield  for  each  y  G  Zn~x  a 
vector 


/xn:Zn->[0,l] 


(2) 


membership- values  nn(x)  may  be  computed  induc- 
tively as  the  overall  tendency  of  linear  chaining  of 
items  in  language  corpora.  For  an  n-elementary 
string  x  G  Zn  be  Hn(x)  the  frequency  of  x  occur- 
ring in  a  corpus.  Then,  for  any 

bi-gram  x  =  (y,z)  G  Zn,y  G  Zn~l,  z  G  Z,  the  coeffi- 
cient 

Hn(x) 


(Hn(v,z1),...,nn(y,zm))T  G  Rr 


(4) 


The  set  of  all  vectors  reflect  the  morphological  struc- 
ture of  the  corpus  analysed  which  is  the  numerically 
specified  basis  for  the  procedural  definition  of 
soft  categories  which  are  defined  as  a  system  of  fuzzy 
sub-sets  of  observed  chaining  regularities.  They  may 
be  interpreted  to  represent  elastic  constraints  operat- 
ing on  the  language  items'  chaining  tendencies  which 
structure  the  corresponding  corpus. 
The  presentation  of  the  development  of  soft  categories 
as  elastic  constraints  (operating  on  different  levels)  can 
be  simplified  by  their  formal  introduction  as  (n-ary) 
fuzzy  relations  and  their  corresponding  numerical  for- 
mats of  transition  matrices  (of  higher  orders). 

For  written  German  discourse  analysed  on  type- 
setting level  with  m  discernable  types  of  signs  (letters) 
and  maximum  lengths  n  of  strings  there  are  quite  a 
number  of  theoretically  possible  (Tab.  1,  col.  Tn)  crisp 
n-ary  relations  Tn  =  Zn,  i.e. 


T2  =  {(xi.xa) 

73    =  {(xi,X2,X3) 


:ii  G  Z} 
:xi,x2  G  Z} 
:ii,X2,X3  G  Z} 


Mn(z)  = 


Hn-1(VY 


(3) 


Tn-1    =    {(Xl,  .  .  .  ,In-l)      :Xl  Xn  G  Z} 

Tn  =  {(xi,.. .,xn)       :xi,...,x„  G  Z}. 

Out  of  these,  however,  only  those  have  to  be  computed 
which  are  not  only  actually  possible  (col.    An)  but 
which  have  indeed  been  observed  to  factually  occur, 
i.e.  Fn  C  F„_i  x  Z  (Tab  1,  col.  Fn),  i.e. 
Z  = 

FiC{u  :xiGZ} 
F2C{(xi,x2)  :xi€Fi,x2eZ} 
F3C  {(xi,x2,x3)      :(xi,x2)  G  F2,x3  G  Z} 

Fn-i  C{(xi,...,xn_i):(xi,...,x„_2)  G  F„_2,x„_i  G  Z} 
FnC{(xi,...,xn)    :(xi,...,x„_i)  G  Fn-i,xn  G  Z). 

The  fuzzy  relational  modeling  (Eqns.  3  and  2)  shows 
that  even  for  higher  n  only  bi-grams  have  to  be  traced 
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Figure  4:  Tree  representation  of  procedural  soft  category  Z  depicting  the  hierarchy  of  graded  letter  agglomeration 
according  to  decreasing  transition  tendencies  (in  7-grams)  of  German  newspaper  texts. 


and  computed  due  the  (n  —  l)-ary  relations  computed 
on  the  previous  level  of  representation.  It  is  this  princi- 
ple of  procedural  self-similarity  of  n-ary  agglomerative 
steps  which  allows  for  the  trie-like  representation  [3]  of 
entities  that  are  labeled  (by  soft  categorial  n-relative 
letter  transitions)  and  are  an  outcome  of  procedural 
constraints  (over  n  levels  of  processing)  which  produce 
a  dynamically  structured  system  of  fuzzy  relations  that 
depicts  the  overall  transition  tendencies  of  signs.  For 
the  letter  Z  this  structure  is  given  in  Fig.  4  illustrating 
sub-regularities  of  morphic  word  formation. 
4.3    The  semantic  space 

Based  upon  the  language  entity  word,  its  different 
types,  and  their  frequencies  of  occurrence  in  natural 
language  discourse,  the  fundamental  distinction  of  ag- 
glomerative or  syntagmatic  and  selective  or  paradig- 
matic relatedness  can  also  be  employed  to  reconstruct 
a  system  structure  which  will  serve  as  base  for  a  proce- 
dural model  generating  tree-like  representations  of  dy- 
namic semantic  constraints.  As  these  techniques  have 
been  introduced  and  elaborated  elsewhere  [9]  [15]  [20], 
their  concise  description  may  suffice  here. 

The  core  of  the  representational  formalism  can  be 
characterized  as  a  two-level  process  of  abstractions. 
The  first  (called  a-abstraction)  on  the  set  of  fuzzy 
subsets  of  the  vocabulary  provides  the  word-types'  us- 
age regularities  or  corpus  points,  the  second  (called 
J-abstraction)  on  this  set  of  fuzzy  subsets  of  corpus 


points  provides  the  corresponding  meaning  points  as  a 
function  of  all  differences  of  all  usage  regularities  which 
a  set  of  word-types  may  produce  by  its  word-tokens' 
frequencies  as  observed  in  pragmatically  homogeneous 
corpora  of  natural  language  texts. 

The  basically  descriptive  statistics  to  specify  inten- 
sities of  co-occurring  lexical  items  in  texts  is  centred 
around  the  correlational  measure 


a(xi,Xj)  = 


T,T=i(hit  ~  eit)(hjt  -  ejt) 


(Er=i(^-eit)2Et=i(^-eJt)2) 

-1  <  a[xi,Xj)  <  +1 


(5) 


where  en  =         and  e^  =  -jj-lt,  computed  over  a 

textcorpus  K  =  {kt};t  =  1  T  having  an  overall 

length  L  =  J2t=i  WA  <  h  <  L  measured  by  the  num- 
ber of  word-tokens  per  text,  and  a  vocabulary  V  = 
{xn}; n  =  l,...,i,j,...,N  of  word-types  whose  fre- 
quencies  are  denoted  by  Hi  =  52t=1  hu;  0  <  hu  <  Hi. 

To  specify  these  correlational  value  distributions' 
differences,  a  measure  of  similarity  (or  rather,  dissimi- 
larity) is  used 

8(VuVj)  =  (^2(a(xi,xn) -a(xjtxn))2^j    ;  (6) 

0<S(yi,yj)  <2v/S 
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Figure  5:  DDS-tiee  representation  of  meaning  of  COMPUTER  as  assembled  for  relevant  meaning  points'  distances 
(first  value)  and  criterialities  (second  value)  on  the  basis  of  the  semantic  space  (S,  C)  intermediate  as  computed 
from  a  subcorpus  of  German  newspaper  texts  (Die  Welt,  1964  Berlin  Edition). 


549 


The  consecutive  application  of  (Eqns.  7)  on  input 
texts  and  (Eqns.  9)  on  the  output  data  of  (Eqns.  7) 
allows  to  model  the  meanings  of  words  as  a  function  of 
differences  of  usage  regularities  (Fig.  6). 

Thus,  ctij  allows  to  express  pairwise  relatedness  of 
word- types  (xi,  Xj)  G  V  x  V  in  numerical  values  ranging 
from  —1  to  +1  by  calculating  co-occurring  word-token 
frequencies  (Eqn.  5)  for  pairs  of  items. 
As  a  fuzzy  binary  relation,  d  :  V  x  V  ->  I  can  be 
conditioned  on  in  G  V  which  yields  a  crisp  mapping 

&\xn  :V ->C;{yn}=:C  (7) 

where  the  tupels  ((x„,i,d(n,  1)), . . . ,  (xn<N,d(n,N))} 
represent  the  numerically  specified,  syntagmatic  usage 
regularities  that  have  been  observed  for  each  word-type 
Xi  against  all  other  xn  G  V.  a-abstraction  over  one  of 
the  components  in  each  ordered  pair  defines 

xi(d(i,l),...,d(t,N))=:1/jGC  (8) 

Hence,  the  regularities  of  usage  of  any  lexical  item  will 
be  determined  by  the  tupel  of  its  a-values  which  for  all 
word  types  can  be  represented  as  vector  space  C. 


Figure  6:  Fuzzy  mapping  relations  d  and  8  between  the 
structured  sets  of  vocabulary  items  xn  G  V,  of  corpus 
points  yn  G  C,  and  of  meaning  points  zn  G  5. 

Considering  C  as  representational  structure  of  ab- 
stract entities  constituted  by  syntagmatic  regularities 
of  word-token  occurrences  in  pragmaticaliy  homoge- 
neous discourse,  then  the  similarities  and/or  dissimilar- 
ities of  these  entities  will  capture  their  corresponding 
word-types'  paradigmatic  regularities  calculated  by  8 
Eqn.  6  serving  as  second  mapping  function,  As  a  fuzzy 
binary  relation,  8  :  C  x  C  ->  I  can  be  conditioned  on 
yn  G  C  which  again  yields  a  crisp  mapping 

8  |  yn  ■  C  ->  5;  {zn}  =:  5  (9) 

where  the  tupels  ({yn.iySin,!)),. . .  ,(yn,N8{n,N))) 
represents  the  numerically  specified  paradigmatic 
structure  that  has  been  derived  for  each  abstract  syn- 
tagmatic usage  regularity  j/j  against  all  other  yn  G  C. 
The  distance  values  can  therefore  be  abstracted  anal- 
ogous to  Eqn.  8,  this  time,  however,  over  the  other  of 


the  components  in  each  ordered  pair,  thus  defining  an 
element  Zj  G  5  called  meaning  point  by 

yj(S<J)l),...,S(J,N))=:zjeS  (10) 

Identifying  zn  G  5  with  the  numerically  specified  el- 
ements of  potential  paradigms,  the  set  of  possible  com- 
binations 5x5  may  structurally  be  constrained  and 
evaluated  without  (direct  or  indirect)  recourse  to  any 
pre-existent  external  world.  Introducing  a  EuCLlDian 
metric 

C:5x5-+/  (11) 

the  hyperstructure  (5,  C)  or  semantic  space  (SS)  is  de- 
clared constituting  the  system  of  meaning  points  as  an 
empirically  founded  and  functionally  derived  represen- 
tation of  a  lexically  labelled  knowledge  structure. 

Weighted  numerically  as  a  function  of  an  ele- 
ment's distance  values  and  its  associated  node's  level 
and  position  in  the  tree,  Cr(zi)  either  is  an  expres- 
sion of  the  head-node's  Z{  meaning-dependencies  on 
the  daughter-nodes  zn  or,  inversely,  expresses  their 
meaning-criterialities  adding  up  to  an  aspect's  inter- 
pretation determined  by  that  head  [15].  To  illustrate 
the  feasibility  of  the  A-operation's  generative  proce- 
dure, the  substructure  of  relevant  constraints  (related 
meaning  points)  DDS(zi)  C  (5,  Q  anchored  with  the 
lexical  item  Xi,  i  —  COMPUTER  is  shown  in  Fig.  4.3. 

5  Conclusion 

It  has  been  outlined  here  that  the  morphic  sign  or 
the  semantic  meaning  functions'  ranges  may  be  com- 
puted and  simulated  as  a  result  of  exactly  those  (semi- 
otic)  procedures  by  way  of  which  (representational) 
structures  emerge  and  their  (interpreting)  actualisation 
is  produced  from  observing  and  analyzing  the  domain's 
possibilistically  determined  constraints  as  imposed  on 
the  linear  ordering  (syntagmatics)  and  the  selective 
combination  (paradigmatics)  of  natural  language  enti- 
ties (morph-types,  word-types)  in  communicative  lan- 
guage performance.  For  fuzzy  linguistic  morhology  and 
lexical  semantics  this  is  tantamount  to  (re-)present  an 
entity's  semiotic  potential  (function,  meaning]  by  a 
fuzzy  distributional  pattern  of  the  modelled  system's 
state  rather  than  a  single  symbol.  The  representational 
system's  dynamic  structure  modeled  by  the  procedures 
outlined  is  to  represent  a  semiotic  cognitive  information 
processing  system's  interpretation  of  its  environment. 
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Abstract 

jfc 

A  Linguistic  Geometry  introduced  in  this  paper 
includes  mathematical  tools  for  knowledge  representation 
and  reasoning  about  multiagent  discrete  pursuit-evasion 
games.  These  class  of  games  is  an  adequate  mathematical 
model  for  the  real  world  combat  operations,  particularly,  for 
the  air  force  and  navy  problem  domains.  Linguistic 
Geometry  relies  on  the  formalization  of  search  heuristics, 
which  allow  one  to  decompose  the  game  into  a  hierarchy  of 
images  (subsystems),  and  thus  solve  otherwise  intractable 
problems  by  reducing  the  search  dramatically.  These 
hierarchical  images  extracted  in  the  form  of  networks  of 
paths  from  the  expert  vision  of  the  problem  are  formalized 
as  a  hierarchy  of  formal  languages.  An  example  of  the 
simplified  four  aircraft  pursuit-evasion  game  is  considered. 
While  the  solution  of  this  problem  was  presented  in  other 
publications  in  this  paper  we  prove  optimality  of  the 
solution.  Based  on  this  example  we  can  suggest  that  for  a 
certain  class  of  multiagent  problems  Linguistic  Geometry 
tools  generate  optimal  solutions. 

1  Background 

Problems  of  long  and  short-range  mission  simulation, 
especially  for  autonomous  navigation,  aerospace  robot 
control,  such  as  unmanned  aerial  vehicles  (UAVs), 
aerospace  combat  operations  control,  etc.,  are  usually 
described  mathematically  in  the  form  of  pursuit-evasion 
differential  games.  An  example  of  such  problem  is  the 
problem  of  real  time  control  of  the  air  combat  in  which  a 
number  of  planes  (manned  or  unmanned)  equipped  with  the 
so-called  countermeasures  evade  a  number  of  pursuers 
equipped  with  missiles.  Another  example  is  the  problem 
optimal  control  of  UAVs  which  are  in  the  reconnaissance 
flight  to  locate  mobile  missile  launchers.  The  actual  launch 
points  are  usually  detected  by  the  satellite  based  sensor.  The 
UAVs  use  detected  launch  points  to  initiate  their  search, 
locate  and  possibly  destroy  them.  In  the  real  world  scenario 
the  UAV  control  should  be  considered  together  with  the  air 
combat  when  UAVs  evade  pursuing  enemy  aircraft.  Similar 
problems  for  the  development  and  real  time  replanning  of 
the  combat  scenarios  are  essential  for  the  Navy  and  Army 
battlefields. 

The  classic  approach  based  on  the  conventional  theory 
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of  Differential  Games  is  insufficient,  especially  in  case  of 
dynamic,  multiagent  models  (Garcia-Ortiz  et  al.,  1993).  It  is 
well  known  that  there  exists  a  small  number  of  differential 
games,  for  which  exact  analytical  solutions  are  available. 
There  are  a  few  more  for  which  numerical  solutions  can  be 
computed,  under  rather  restrictive  conditions,  in  a 
reasonable  amount  of  time.  It  is  even  worse  that  each  of 
these  games  is  one-to-one  which  is  very  far  from  the  real 
world  combat  scenarios.  They  are  also  of  the  "zero-sum 
type"  which  does  not  allow  a  new  agent  to  join  the  game  or 
all  the  agents  of  both  sides  to  be  disengaged.  Another 
difficulties  arise  from  the  requirements  of  the  3D  modeling 
and  from  limitation  of  the  lifetime  of  the  agents. 

Following  (Rodin,  1988;  Shinar,  1990)  discrete-event 
modeling  of  complex  control  systems  can  be  implemented 
as  a  purely  interrogative  discrete  simulation.  These 
techniques  can  be  based  on  generating  geometrically 
meaningful  states  rather  than  time  increments  with  due 
respect  to  the  timeliness  of  actions.  By  discretizing  time,  a 
finite  game  tree  can  be  obtained.  The  nodes  of  the  tree 
represent  the  states  of  the  game,  where  the  players  can 
select  their  controls  for  a  given  period  of  time.  It  is  also 
possible  to  distinguish  the  respective  moves  of  the  two  sides 
(including  simultaneous  actions).  Thus,  the  branches  of  the 
tree  are  the  moves  in  the  game  space.  The  pruning  of  such 
tree  is  the  basic  task  of  heuristic  search  techniques. 
Interrogative  approach  to  control  problems  offers  much 
faster  execution  and  clearer  simulator  definition  (Lirov  et 
al.,  1988). 

In  the  beginning  of  80' s  Botvinnik,  Stilman,  and  others 
developed  one  of  the  most  interesting  and  powerful 
heuristic  hierarchical  models  based  on  semantic  networks. 
It  was  successfully  applied  to  scheduling,  planning,  control, 
and  computer  chess.  Application  of  the  developed  model  to 
the  chess  domain  was  implemented  in  full  as  program 
PIONEER  (Botvinnik,  1984).  Similar  heuristic  model  was 
implemented  for  power  equipment  maintenance  in  a 
number  of  computer  programs  being  used  for  maintenance 
scheduling  all  over  the  former  USSR  (Botvinnik  et  al., 
1983;  Stilman,  1985,  1993a).  The  semantic  networks  were 
introduced  in  (Botvinnik,  1984;  Stilman,  1977)  in  the  form 
of  ideas,  plausible  discussions,  and  program 
implementations. 

2     A  Linguistic  Geometry:  Informal  Survey 

A  formal  theory,  the  Linguistic  Geometry  -  LG 
(Stilman,  1992-96),  includes  the  syntactic  tools  for 
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knowledge  representation  and  reasoning  about  multiagent 
discrete  pursuit-evasion  games.  This  approach  provides  us 
with  an  opportunity  to  transfer  formal  properties  and 
constructions  from  one  problem  to  another  and  to  reuse 
tools  in  a  new  problem  domain.  In  a  sense,  it  is  the 
application  of  the  methods  of  a  chess  expert  to  robot  control 
or  maintenance  scheduling  and  vice  versa. 

Linguistic  Geometry  has  been  developed  as  a  generic 
approach  to  a  certain  class  of  complex  systems  that  involves 
breaking  down  a  system  into  dynamic  subsystems.  We 
generate  new  multi-goal,  multi-level  system  and  substitute 
it  for  the  original  one-goal,  one-level  system  by  introducing 
intermediate  goals  and  breaking  the  system  down  into 
subsystems  striving  to  attain  these  goals. 

A  set  of  dynamic  subsystems  is  represented  as  a 
hierarchy  of  formal  languages.  Various  examples  of 
problems  solved  employing  LG  tools  have  been  published 
(Stilman,  1992-1996).  During  the  entire  history  of  the 
development  of  LG,  we  were  always  concerned  about 
approximate  solutions,  almost  winning  strategies,  and  how 
to  measure  their  accuracy.  This  was  like  an  axiom:  in 
general,  heuristics  do  not  generate  an  optimum,  and  even  if 
they  do  it  is  usually  hard  to  prove  that  this  is  an  optimum. 

This  paper  is  devoted  to  a  proof  of  optimality  of  the 
solution  generated  by  LG  algorithms  for  a  class  of 
multiagent  problems.  We  will  show  how  the  heuristic 
algorithm  generates  a  solution  employing  a  very  small 
search  tree  (Section  8).  Then,  in  Section  9,  we  will  prove 
that  this  solution  is  optimal. 

3     A  Class  of  Problems 

A  Complex  System  is  the  following  eight-tuple: 
<  X,  P,  Rp,  {ON},  v,  Sj,  St,  TR>,  where 
X={xj}  is  a  finite  set  of  points;  locations  of  elements; 
P={pj }  is  a  finite  set  of  elements;  P  =  Pj  u  P2,  Pi  *  0,  P2  * 

0; 

Rp(x,  y)  is  a  set  of  binary  relations  of  reachability  in  X  (x 

and  y  e  X,  p  e  P); 
ON(p)=x,  where  ON  is  a  partial  function  of  placement  from 

P  into  X; 

v  is  a  function  on  P  with  positive  integer  values  describing 
the  values  of  elements.  The  Complex  System  searches 
the  state  space  that  has  initial  and  target  states; 

Sj  and  St  are  the  descriptions  of  the  initial  and  target  states 
in  the  language  of  the  first  order  predicate  calculus, 
which  matches  with  each  relation  a  certain  Well- 
Formed  Formula  (WFF).  Thus,  each  state  from  Sj  or  St 
is  described  by  a  certain  set  of  WFF  of  the  form 
{ON(Pj)  =  xk}; 

TR  is  a  set  of  operators,  TRANSITION(p,  x,  y),  of 
transitions  of  the  System  from  one  state  to  another  one. 
These  operators  describe  the  transition  in  terms  of  two 
lists  of  WFF  (to  be  removed  from  and  added  to  the 
description  of  the  state),  and  of  WFF  of  applicability  of 
the  transition.  Here,  Remove  list:  ON(p)=x,  ON(q)=y; 
Add  list:  ON(p)=y;  Applicability  list:  (ON(p)=x)a 
Rp(x,y),  where  p  e  Pj  and  q  e  P2  or  vice  versa.  The 


transitions  are  carried  out  with  participation  of  a 
number  of  elements  p  from  Pj ,  P2. 

According  to  the  definition  of  the  set  P,  the  elements  of 
the  System  are  divided  into  two  subsets  Pj  and  P2.  They 

might  be  considered  as  units  moving  along  the  reachable 
points.  Element  p  can  move  from  point  x  to  point  y  if  these 
points  are  reachable,  i.e.,  Rp(x,  y)  holds.  The  current 
location  of  each  element  is  described  by  the  equation 
ON(p)=x.  Thus,  the  description  of  each  state  of  the  System 
{ON(pj)  =  x^}  is  the  set  of  descriptions  of  the  locations  of 
elements.  The  operator  TRANSITION(p,  x,  y)  describes  the 
change  of  the  state  of  the  System  caused  by  the  move  of  the 
element  p  from  point  x  to  point  y.  The  element  q  from  point 
y  must  be  withdrawn  (eliminated)  if  p  and  q  do  not  belong 
to  the  same  subset  (Pj  or  P2). 

The  problem  of  the  optimal  operation  of  the  System  is 
considered  as  a  search  for  the  optimal  sequence  of 
transitions  leading  from  the  initial  state  of  Sj  to  a  target 
state  of  St.  To  solve  this  class  of  problems,  we  could  use 
formal  methods  like  those  in  the  problem-solving  or 
planning  systems  (like  STRIPS  or  subsequent).  However, 
the  search  would  have  to  be  made  in  a  space  of  a  huge 
dimension  (for  nontrivial  examples).  Thus,  in  practice,  no 
solution  would  be  obtained.  We  devote  ourselves  to  finding 
a  solution  of  a  reformulated  problem. 

4  A  Set  of  Paths:  Language  of  Trajectories 

A  trajectory  for  an  element  p  of  P  with  the  beginning  at 
x  of  X  and  the  end  at  y  of  X  (x  *  y)  with  a  length  /  is  the 
following  formal  string  of  symbols  a(\)  with  points  of  X  as 
parameters:  t0  =  a(x)a(X])...<z(x/),  where  x/  =  y,  each 

successive  point  Xj+j  is  reachable  from  the  previous  point 
Xp  i.e.,  Rp(xr  xi+1)  holds  for  i  =  0,  1,...,  /-I;  element  p 
stands  at  the  point  x:  ON(p)=x.  We  denote  by  tp(x,  y,  /)  the 

set  of  all  trajectories  for  element  p,  beginning  at  x,  end  at  y, 
and  with  length  /. 

A  shortest  trajectory  t  of  tp(x,  y,  /)  is  the  trajectory  of 
the  minimum  length  for  the  given  beginning  x,  end  y,  and 
element  p. 

A  Language  of  Trajectories  LtH(S)  for  the  Complex 
System  in  a  state  S  is  the  set  of  all  the  trajectories  of  length 
less  or  equal  H.  Various  properties  of  this  language  and 
generating  grammars  were  investigated  in  (Stilman,  1993a). 

Examples  of  distance  measurements  and  trajectory 
generation  for  robotic  vehicles  are  considered  in  (Stilman, 
1993a,  1993c,  1994b). 

5  Networks  of  Paths: 

Languages  of  Trajectory  Networks 

An  example  of  such  network  is  shown  in  Fig.  2.  The 
basic  idea  behind  these  networks  is  as  follows.  Element  p0 

should  move  along  the  main  trajectory  a(l)a(2)a(3)a(4)a(5) 
to  reach  the  ending  point  5  and  remove  the  target  (an 

opposing  element).  Naturally,  the  opposing  elements  should 
try  to  disturb  those  motions  by  controlling  the  intermediate 
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points  of  the  main  trajectory.  They  should  come  closer  to 
these  points  (to  the  point  4  in  Fig.  2)  and  remove  element 
p0  after  its  arrival  (at  point  4).  For  this  purpose,  elements  q3 
or  q2  should  move  along  the  trajectories  a(6)a(7)a(4)  and 
a(8)a(9)a(4),  respectively,  and  wait  (if  necessary)  at  the 
next  to  last  point  (7  or  9)  for  the  arrival  of  element  pQ  at 

point  4.  Similarly,  element  pj  of  the  same  side  as  p0  might 
try  to  disturb  the  motion  of  q2  by  controlling  point  9  along 
the  trajectory  a(13)a(9).  It  makes  sense  for  the  opposing 
side  to  include  the  trajectory  a{\ l)a(12)a(9)  of  element  qj 
to  prevent  this  control. 


Fig.  2.  Network  language  interpretation. 
A  trajectory  connection  of  the  trajectories  tj  and  t2  l% 
the  relation  C(tj,t2).  It  holds  if  the  ending  link  of  the 
trajectory  tj  coincides  with  an  intermediate  link  of  the 
trajectory  t2;  more  precisely,  tj  is  connected  with  t2  if 
among  the  parameter  values  P(t2)={y,y|,...,y/}  of 
trajectory  t2  there  is  a  value  y,  =  xk,  where 
t]=a(x0)a(xj)...a(X]c).  If  tj  belongs  to  a  set  of  trajectories 

with  the  common  end-point,  then  the  entire  set  is  said  to  be 
connected  with  the  trajectory  t2. 

A  formal  definition  of  the  Language  of  Zones  (based 
on  the  trajectory  connection)  is  given  in  (Stilman,  1993b, 
1993c,  1994a).  The  Zone  corresponding  to  the  trajectory 
network  in  Fig.  2  is  represented  as  follows. 

Z=f(p0,a(lM2M3)«(4M5),5)f(q3,a(6)a(7)a(4),4) 
*(q2,  a(8)a(9)a(4),  4)  f(Pl,  a(13)a(9),  1) 
t(qha(\ l)a(12)a(9),  3)  f(p2,  a(10)a(12),  1) 

A  language  LZH(S)  generated  by  the  certain  grammar 
Gz  (Stilman,  1993b,  1993c,  1994a)  in  a  state  S  of  a 
Complex  System  is  called  the  Language  of  Zones. 

6    A  Complex  System  of  Robotic  Vehicles 

For  the  robotic  model  the  set  X  of  the  Complex  System 
(Section  3)  represents  the  operational  district,  which  could 
be  the  area  of  combat  operation,  broken  into  smaller  2D  or 
3D  areas,  "points",  e.g.,  in  the  form  of  the  2D  or  3D  grid.  P 
is  the  set  of  robots  or  autonomous  vehicles.  It  is  broken  into 
two  subsets  and  P2  with  opposing  interests;  Rp(x,y) 
represent  moving  capabilities  of  various  robots  for  various 


problem  domains:  robot  p  can  move  from  point  x  to  point  y 
if  Rp(x,  y)  holds.  Analogously,  we  can  represent  the  rest  of 
the  parameters  of  the  Complex  System. 

7    2D  Model:  Problem  Statement 

Robots  with  various  moving  capabilities  are  shown  in 
Fig.  3.  The  operational  district  X  is  the  table  8x8.  Robot 
W-FIGHTER  (White  Fighter)  standing  on  h8,  can  move  to 
any  next  square  (shown  by  arrows).  The  other  robot  B- 
BOMBER  (Black  Bomber)  from  h5  can  move  only  straight 
ahead,  one  square  at  a  time,  e.g.,  from  h5  to  h4,  from  h4  to 
h3,  etc.  Robot  B -FIGHTER  (Black  Fighter)  standing  on  a6, 
can  move  to  any  next  square  similarly  to  robot  W- 
FIGHTER  (shown  by  arrows).  Robot  W-BOMBER 
standing  on  c6  is  analogous  with  the  robot  B-BOMBER;  it 
can  move  only  straight  ahead  but  in  reverse  direction.  Thus, 
robot  W-FIGHTER  on  h8  can  reach  any  of  the  points  y 
e  {h7,  g7,  g8}  in  on  step,  i.e.,  RW-FIGHTER(h8,  y)  holds, 
while  W-BOMBER  can  reach  only  c8  in  one  step.  Assume 
that  robots  W-FIGHTER  and  W-BOMBER  belong  to  one 
side,  while  B -FIGHTER  and  B-BOMBER  belong  to  the 
opposing  side:  W-FIGHTER  e  P{,  W-BOMBER  e  Pi,  B- 
FIGHTER  e  P2,  B-BOMBER  e  P2. 


• 

*• 

-#- 

- 1 

0 

abcdefgh 

Fig.  3.  2D  problem. 
Also  assume  that  two  more  robots,  W-TARGET  and  B- 
TARGET,  (unmoving  devices  or  target  areas)  stand  on  hi 
and  c8,  respectively.  W-TARGET  belongs  to  Pj,  while  B- 
TARGET  e  P2.  Each  of  the  BOMBERs  can  destroy 
unmoving  TARGET  ahead  of  the  course;  it  also  has 
powerful  weapons  capable  to  destroy  opposing  FIGHTERs 
on  the  next  diagonal  squares  ahead  of  the  course.  For 
example  W-BOMBER  from  c6  can  destroy  opposing 
FIGHTERs  on  b7  and  d7.  Each  of  the  FIGHTERs  is 
capable  to  destroy  an  opposing  BOMBER  approaching  its 
location,  but  it  is  also  capable  to  protect  its  friendly 
BOMBER  approaching  its  prospective  location.  In  the  latter 
case  the  joint  protective  power  of  the  combined  weapons  of 
the  friendly  BOMBER  and  FIGHTER  can  protect  the 
BOMBER  from  interception.  For  example,  W-FIGHTER 
located  at  d6  can  protect  W-BOMBER  on  c6  and  c7. 

The  combat  considered  can  be  broken  into  two  local 
operations.  The  first  operation  is  as  follows:  robot  B- 
BOMBER  should  reach  point  hi  to  destroy  the  W- 
TARGET,  while  W-FIGHTER  will  try  to  intercept  this 
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motion.  The  second  operation  is  similar:  robot  W- 
BOMBER  should  reach  point  c8  to  destroy  the  B- 
TARGET,  while  B -FIGHTER  will  try  to  intercept  this 
motion.  After  destroying  the  opposing  TARGET  the 
attacking  side  is  considered  a  winner  of  the  local  operation 
and  the  global  battle.  The  only  chance  for  the  opposing  side 
to  avenge  is  to  hit  its  TARGET  on  the  next  time  increment 
and  this  way  end  the  battle  in  a  draw.  The  conditions 
considered  above  give  us  St,  the  description  of  target  states 
of  the  Complex  System.  The  description  of  the  initial  state 
Sj  is  obvious  and  follows  from  Fig.  3. 

Assume  that  motions  of  the  opposing  sides  alternate 
and  due  to  the  shortage  of  resources  (which  is  typical  in  a 
real  combat  operation)  or  some  other  reasons,  each  side  can 
not  participate  in  both  operations  simultaneously.  It  means 
that  during  the  current  time  interval,  in  case  of  White  turn, 
either  W-BOMBER  or  W-FIGHTER  can  move.  Analogous 
condition  holds  for  Black.  Of  course,  it  does  not  mean  that 
if  one  side  began  participating  in  one  of  the  operations  it 
must  complete  it.  Any  time  on  its  turn  each  side  can  switch 
from  one  operation  to  another,  e.g.,  transferring  resources 
(fuel,  weapons,  human  resources,  etc.),  and  later  switch 
back. 

Both  restrictions  have  been  relaxed  and  completely 
eliminated  in  (Stilman,  1995c,  1995d,  1996b). 

It  seems  that  local  operations  are  independent,  because 
they  are  located  far  from  each  other.  Moreover,  the 
operation  of  B-BOMBER  from  h5  looks  like 
unconditionally  winning  operation,  and,  consequently,  the 
global  battle  can  be  easily  won  by  the  Black  side. 

Is  there  a  strategy  for  the  White  side  to  make  a  draw? 
The  specific  formal  question  is  as  follows:  Is  there  an 
optimal  strategy  that  provides  one  of  the  following? 

1.  Both  BOMBERs  hit  their  targets  on  subsequent  time 
increments  and  stay  safe  for  at  least  one  time 
increment. 

2.  Both  BOMBERs  are  destroyed  before  they  hit  their 
targets  or  immediately  after  that. 

We  answer  this  question  in  Sections  8-9.  Of  course,  it 
can  be  answered  by  the  direct  search  employing,  for 
example,  minimax  algorithm  with  alpha-beta  cut-offs. 
Experiments  with  the  computer  programs  showed  that  in 
order  to  solve  this  problem  employing  conventional 
approaches  the  search  tree  should  include  about  a  million 
moves  (transitions).  Consider  how  the  Hierarchy  of 
Languages  works  for  the  optimal  control  of  this  Robotic 
System  (Fig.  3). 


8     2D  Model:  Search 

We  generate  a  string  of  the  Language  of  Translations 
representing  it  as  a  conventional  search  tree  (Fig.  5)  and 
comment  on  its  generation. 

First,  the  Language  of  Zones  in  the  start  state  is 
generated.  The  targets  for  attack  are  determined  within  the 
limited  number  of  steps  which  is  called  a  horizon.  In 
general,  the  value  of  the  horizon  is  unknown.  As  a  rule,  this 
value  can  be  determined  from  the  experience  of  solving 
specific  classes  of  problems  employing  Linguistic 


Geometry  tools.  In  absence  of  such  experience,  first,  we 
have  to  consider  the  value  of  1  as  a  horizon,  and  solve  the 
problem  within  this  value.  If  we  still  have  resources 
available,  i.e.,  computer  time,  memory,  etc.,  we  can 
increase  the  horizon  by  one.  After  each  increase  we  have  to 
regenerate  the  entire  model.  This  increase  means  a  new 
level  of  "vigilance"  of  the  model,  and,  consequently,  new 
greater  need  for  resources. 


Fig.  4.  Zones  in  the  initial  state. 

In  our  case  it  is  easy  to  show  that  within  the  horizons  of 
1,  2,  3,  4  all  the  models  are  "blind"  and  corresponding 
searches  do  not  give  a  "reasonable"  solution.  But,  again, 
after  application  of  each  of  the  consecutive  values  of  the 
horizon  we  will  have  a  solution  which  can  be  considered  as 
an  approximate  solution  within  the  available  resources. 
Thus,  let  the  horizon  H  of  the  language  Lz(S)  is  equal  to  5, 
i.e.,  the  length  of  main  trajectories  of  all  Zones  must  not 
exceed  5  steps.  All  the  Zones  generated  in  the  start  state  are 
shown  in  Fig.  4.  Zones  for  the  FIGHTERs  as  attacking 
elements  are  shown  in  the  left  diagram,  while  Zones  for 
BOMBERs  -  in  the  right  one. 

Generation  begins  with  the  move  1.  c6-c7  in  the  White 
Zone  with  the  target  of  the  highest  value  and  with  the 
shortest  main  trajectory.  The  order  of  consideration  of 
Zones  and  particular  trajectories  is  determined  by  the 
grammar  of  translations.  Next  move,  1.  ...  a6-b7,  is  in  the 
same  Zone  along  the  first  negation  trajectory.  Interception 
continues:  2.  c7-c8  b7:c8  (Fig.  6,  left).  Symbol  ":"  means 
the  removal  of  element.  Here  the  grammar  cuts  this  branch 
with  the  value  of  -1  (as  a  win  of  the  Black  side). 

Then,  the  grammar  initiates  the  backtracking  climb. 
Each  backtracking  move  is  followed  by  the  inspection 
procedure,  the  analysis  of  the  subtree  generated  in  the 
process  of  the  earlier  search.  After  the  climb  up  to  the 
move  1.  ...  a6-b7,  the  tree  to  be  analyzed  consists  of  one 
branch  (of  two  plies):  2.  c7-c8  b7:c8.  The  inspection 
procedure  determined  that  the  current  minimax  value  (-1) 
can  be  "improved"  by  the  improvement  of  the  exchange  on 
c8  (in  favor  of  the  White  side).  This  can  be  achieved  by 
participation  of  W-FIGHTER  from  h8,  i.e.,  by  generation 
and  inclusion  of  the  new  so-called  "control"  Zone  with 
main  trajectory  from  h8  to  c8.  The  set  of  different  Zones 
from  h8  to  c8  (the  bundle  of  Zones)  is  shown  in  Fig.  6, 
right.  The  move-ordering  procedure  picks  the  subset  of 
Zones  with  main  trajectories  passing  g7.  These  trajectories 
partly  coincide  with  the  main  trajectory  of  another  Zone 
attacking  the  opposing  W-BOMBER  on  h5.  The  motion 
along  such  trajectories  allows  to  "gain  time",  i.e.,  to 
approach  two  goals  simultaneously. 
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The  generation  continues:  2.  h8-g7  b7:c7.  Again,  the 
procedure  of  "square  rules"  cuts  the  branch,  evaluates  it  as  a 
win  of  the  black  side,  and  the  grammar  initiates  the  climb. 
Analogously  to  the  previous  case,  the  inspection  procedure 
determined  that  the  current  minimax  value  (-1)  can  be 
improved  by  the  improvement  of  the  exchange  on  c7. 
Again,  this  can  be  achieved  by  the  inclusion  of  Zone  from 
h8  to  c7.  Of  course,  the  best  "time-gaining"  move  in  this 
Zone  is  2.  h8-g7,  but  it  was  already  included  (as  move  in 
the  Zone  from  h8  to  c8),  and  it  appeared  to  be  useless.  No 
other  branching  at  this  state  is  generated. 
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Fig.  5.  Search  tree  for  the  robotic  system. 
The  inspection  procedure  does  not  find  new  Zones  to 
improve  the  current  minimax  value,  and  the  climb  continues 
up  to  the  start  state.  The  analysis  of  the  subtree  shows  that 
inclusion  of  Zone  from  h8  to  c8  in  the  start  state  can  be 
useful:  the  minimax  value  can  be  improved.  Similarly,  the 
most  promising  "time-gaining"  move  is  1.  h8-g7.  The  Black 
side  responded  1.  ...  a6-b6  along  the  first  negation 


trajectories  a(a6)a(b6)a(c7)  (Fig.  4,  left).  Note,  that  the 
grammar  "knows"  that  in  this  state  trajectory 
a(a6)«(b6)a(c7)  is  active,  i.e.,  B-FIGHTER  has  enough 
time  for  interception.  The  following  moves  are  in  the  same 
Zone  of  W-BOMBER:  2.  c6-c7  b6:c7.  This  state  is  shown 
in  Fig.  7,  left.  The  "square  rule  procedure"  cuts  this  branch 
and  evaluates  it  as  a  win  of  the  Black  side. 


t 

0 

Fig.  6.  States  where  the  control  Zone  from  h8  to  c8  was  detected 
(left)  and  where  it  was  activated  (right). 


• 

t 

0 

Fig.  7.  States  where  the  control  Zone  from  g7  to  c7  was  detected 
(left)  and  where  it  was  activated  (right). 
New  climb  up  to  the  move  2.  ...  a6-b6  and  execution  of 
the  inspection  procedure  resulted  in  the  inclusion  of  the  new 
control  Zone  from  g7  to  c7  in  order  to  improve  the 
exchange  on  c7.  The  set  of  Zones  with  different  main 
trajectories  from  g7  to  c7  is  shown  in  Fig.  7,  right.  Besides 
that,  the  trajectories  from  g7  to  h4,  h3,  h2,  and  hi  are 
shown  in  the  same  Fig.  7.  These  are  "potential"  intercepting 
trajectories.  It  means  that  beginning  with  the  second  symbol 
a(f6),a(g6)  or  a(h6)  these  trajectories  become  intercepting 
trajectories  (Section  5)  in  the  Zone  of  B-BOMBER  h5. 
Speaking  informally,  from  squares  f6,  g6,  and  h6  W- 
FIGHTER  can  intercept  B-BOMBER  (in  case  of  white 
move).  The  move-ordering  procedure  picks  the  subset  of 
Zones  with  the  main  trajectories  passing  f6.  These 
trajectories  partly  coincide  with  the  potential  first  negation 
trajectories.  The  motion  along  such  trajectories  allows  to 
gain  time,  i.e.,  to  approach  two  goals  simultaneously.  Thus, 
2.  g7-f6.  This  way  proceeding  with  the  search  we  will 
generate  the  tree  that  consists  of  46  moves. 


9     2D  Model:  PROOF 

9. 1  Terminal  Sets 

Let  us  prove  that  the  optimal  variant  of  the  reduced 
search  tree  shown  in  Fig.  5  is  the  optimal  solution  of  this 
problem.  This  means  that  we  have  to  prove  that  this  variant 
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is  optimal  not  only  within  the  reduced  search  tree  but  also 
within  the  full  search  tree,  which  could  be  generated  as  a 
result  of  exhaustive  search.  (Here  and  below  in  this  paper 
we  use  bold  font  to  designate  various  sets.) 

The  set  of  goal  states  for  this  problem  can  be  broken 
into  three  subsets: 

1.  Subset  W-Win  is  the  set  of  states  where  B-BOMBER 
is  destroyed,  and  W-BOMBER  hit  B-TARGET  and  has 
been  safe  for  at  least  one  time  increment  after  that. 

W-Win  =  BB-Destroyed  n  WB-Safe 

2.  Subset  B-Win  is  the  set  of  states  where  W-BOMBER 
is  destroyed,  and  B-BOMBER  hit  W-TARGET  and  has 
been  safe  for  at  least  one  time  increment  after  that. 

B-Win  =  WB-Destroyed  n  BB-Safe 

3.  Subset  Draw  is  the  set  of  states  where  both  BOMBERs 
hit  their  targets  on  subsequent  time  increments  and 
have  been  safe  for  at  least  one  time  increment  or  both 
BOMBERs  are  destroyed  before  this  happens.  The  set 
of  states  considered  in  this  definition  can  be 
represented  as  follows: 

Draw  =  Safe  u  Destroyed, 

where 

Destroyed  =  BB-Destroyed  n  WB-Destroyed, 
Safe  =  BB-Safe  n  WB-Safe 

A  subtree  of  the  full  search  tree  is  called  optimal  if 
after  applying  the  minimax  algorithm  on  the  full  tree,  the 
minimax  value  of  the  root  node  is  equal  to  one  of  the  values 
of  the  terminal  nodes  of  this  subtree. 

Let  A  be  a  set  of  states.  The  strategy  is  called  an  A 
strategy  if  it  is  represented  by  the  optimal  subtree  with  the 
terminal  nodes  which  represent  states  from  A,  only.  Thus, 
for  the  W-Win  strategy,  the  corresponding  terminal  nodes 
belong  to  W-Win  only.  For  the  B-Win  strategy,  the 
terminal  nodes  should  belong  to  B-Win.  The  Draw 
strategy  is  represented  by  the  optimal  subtree  with  the 
terminal  nodes  from  Draw. 

To  prove  that  the  bold  subtree  of  the  search  tree  shown 
in  Fig.  5  represents  optimal  strategy  for  this  problem,  and 
this  strategy  gives  a  draw,  we  have  to  prove  that  it 
represents  a  Draw  strategy  for  this  problem.  This  means 
that  the  following  Theorem  holds:  


Theorem 

1.  The  terminal  states  of  the  bold  subtree  belong  to  the 
Draw. 

2.  The  bold  subtree  is  optimal  (with  respect  to  the  full 
search  tree).  


Let  us  prove  statement  1. 


9.  2  Terminal  Sets  Expansion 

The  terminal  states  of  the  bold  subtree  shown  in  Fig.  5 
belong  neither  of  the  subsets  considered  above  despite  they 
received  exact  values.  Let  us  expand  these  state  subsets 
trying  to  include  the  actual  terminal  states.  We  shall  begin 
with  the  expansion  of  the  set  Draw  by  introducing  a  new 
set  DrawExpand.  By  definition  this  is  the  set  of  states  such 
that  for  each  of  them  a  Draw  strategy  exists.  Ultimately, 
we  have  to  prove  that  our  start  state  belongs  to 


DrawExpand. 

We  will  achieve  this  goal  by  investigating  the  structure 
of  this  set  employing  LG  tools. 

First,  we  will  consider  the  states  where  the  draw  can  be 
achieved  by  destroying  both  BOMBERs  (Fig.  8).  Let  BB- 
Intercept  be  the  set  of  states  where  for  each  of  the  states 
there  is  an  optimal  strategy  for  W-FIGHTER  to  intercept  B- 
BOMBER.  More  formally,  if  BB-Destroyed  is  the  set  of 
states  where  B-BOMBER  is  destroyed  (see  definition  of  the 
Draw),  then  BB-Intercept  is  the  set  of  states  where  BB- 
Destroyed  strategy  exists. 

Let  WB-Intercept  be  the  set  of  states  where  for  each 
state  there  is  an  optimal  strategy  for  B-FIGHTER  to 
intercept  the  W-BOMBER.  More  formally,  if  WB- 
Destroyed  is  the  set  of  states  where  W-BOMBER  is 
destroyed  (see  definition  of  Draw),  then  WB-Intercept  is 
the  set  of  states  where  WB-Destroyed  strategy  exists. 


Intercept 


BB-DestroyedjL/  WB-Des/royed 

Destroyed 
Fig.  8.  Expansion  of  terminal  sets 
Consider  the  intersection  of  BB-Intercept  and  WB- 
Intercept, 

Intercept  =  BB-Intercept  n  WB-Intercept 

i.e.,  the  set  of  states  where  the  Destroyed  strategy  exists, 
taking  into  account  that 

Destroyed  =  BB-Destroyed  n  WB-Destroyed. 
Obviously,  Destroyed  <=  DrawExpand. 

Analogously,  we  will  consider  the  states  where  the 
draw  can  be  achieved  by  hitting  the  TARGETS  and  saving 
both  BOMBERs. 

Let  BB-Protect  be  the  set  of  states  where  for  each  of 
the  states  there  is  an  optimal  strategy  for  B-BOMBER  to  hit 
the  W-TARGET  and  stay  safe.  More  formally,  if  BB-Safe 
is  the  set  of  states  where  W-TARGET  is  hit  and  B- 
BOMBER  is  safe  (see  definition  of  the  Draw),  then  BB- 
Protect  is  the  set  of  states  where  BB-Safe  strategy  exists. 

Let  WB -Protect  be  the  set  of  states  where  for  each 
state  there  is  an  optimal  strategy  for  W-BOMBER  to  hit  B- 
TARGET  and  stay  safe.  More  formally,  if  WB-Safe  is  the 
set  of  states  where  B-TARGET  is  hit  and  W-BOMBER  is 
safe  (see  definition  of  Draw),  then  WB-Protect  is  the  set 
of  states  where  WB-Safe  strategy  exists. 

Now,  consider  the  intersection  BB-Protect  and  WB- 
Protect 

Protect  =  BB-Protect  n  WB-Protect 
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i.e.,  the  set  of  states  where  the  Safe  strategy  exists  taking 
into  account  that 

Safe  =  BB-Safe  n  WB-Safe. 
Obviously,  Protect  e  DrawExpand.  Thus, 

Intercept  u  Protect  c  DrawExpand. 
It  should  be  noted  that  this  is  a  strict  implication.  The 
existence  of  the  Destroyed  strategy  for  Intercept,  the  Safe 
strategy  for  Protect,  and  the  Draw  strategy  for  Draw 
Expand,  and  the  fact  that 

Draw  =  Safe  u  Destroyed, 
does  not  mean  that 

DrawExpand  =  Intercept  u  Protect. 
Basically,  we  are  saying  that  the  expansion  of  the  union  of 
two  sets  is  not  equal  to  the  union  of  their  separate 
expansions,  i.e.,  the  distribution  law  does  not  hold  in  this 
case.  Indeed,  there  might  be  states  where  the  draw  can  be 
achieved  by  the  strategy  leading  to  the  states  from  Safe  and 
Destroyed  simultaneously.  This  means  that  at  some  higher 
levels  of  the  subtree  representing  the  optimal  strategy  we 
would  not  be  able  to  distinguish  which  kind  of  draw  is 
being  currently  pursued,  Safe  or  Destroyed.  Consequently, 
these  states  might  belong  to  the  direct  expansion  of  the 
union 

Safe  u  Destroyed, 

which  means  that 

DrawExpand  -  (Intercept  u  Protect)  *  0. 


9.  3  A  Structure  of  Expanded  Terminal  Sets 

Let  us  describe  the  structure  of  the  subsets  introduced 
above  employing  LG  tools.  We  begin  with  BB-Intercept. 
In  our  problem  the  only  element  that  can  potentially 
intercept  and  destroy  the  B-BOMBER  is  the  W-FIGHTER. 
Consider  the  set  of  all  states  where  W-FIGHTER  is  in  the 
Zone  of  B-BOMBER,  and  it  is  the  only  intercepting 
element.  First  we  define  the  local  BB-Interceptg_2one- 

BB-Interceptg_2one  can  De  described  as  the  set  of  states  of 
the  following  set  of  Zones,  the  B-Zone,  with  one  of  the 
following  main  trajectories:  a(h5)a(h4)a(h3)a(h2)a(hl), 
a(h4)a(h3)a(h2)a(hl),  a(h3)a(h2)a(hl),  c(h2)a(hl). 
These  Zones  are  nested  in  each  other.  Two  subsets  of  B- 
Zone  are  shown  in  Fig.  9 


0 

Fig.  9.  The  sample  Zones  describing  BB-Intercept  and 
corresponding  gateways. 
The  multiple  locations  of  W-FIGHTER  designate  Zone 
gateways,  the  locations  through  which  W-FIGHTER  can 
enter  the  Zone  employing  the  shortest  path  leading  from  h8. 
Once  in  the  Zone  (through  one  of  the  gateways)  in  case  of 


White  turn  the  W-FIGHTER  is  able  to  intercept  the  B- 
BOMBER  pursuing  it  persistently  even  in  the  worst  case, 
i.e.,  if  Black  does  not  skip  moves  in  this  Zone.  In  case  of 
the  Zones  considered  here  the  proof  of  guaranteed 
interception  is  trivial.  This  follows  from  the  fact  that  the 
trajectory  of  W-FIGHTER  is  the  1st  negation  trajectory 
which  is  by  definition  of  such  a  length  that  interception  is 
guaranteed. 

We  assume  that  this  Zone  is  considered  independently 
of  the  rest  of  the  elements  of  Complex  System  which 
means  the  interception  of  B-BOMBER  is  guaranteed  if  we 
consider  the  motions  within  the  B-Zone,  only.  For  different 
problems  and  more  sophisticated  Zones  to  prove  that  the 
optimal  variant  of  the  Zone's  skirmish  contains  interception 
of  the  main  element  we  would  have  to  use  theorems  on 
Network  Languages  (Stilman,  1994).  The  global  BB- 
Intercept  is  a  subset  of  BB-Interceptg_Zone  such  that 

interception  is  guaranteed  with  respect  to  the  entire  system. 

Consider  WB-Intercept.  In  our  problem  the  only 
element  that  can  "potentially"  intercept  and  destroy  the  W- 
BOMBER  is  the  B-FIGHTER.  Consider  the  set  of  all  states 
where  B-FIGHTER  is  in  the  Zone  of  W-BOMBER,  and  it  is 
the  only  intercepting  element.  First  we  define  the  local  WB- 

Intercept^Y  ^one'  ^^"'n^erceP^W-Zone  can  ^e  described 
as  the  of  the  set  the  set  of  states  of  Zones,  the  W-Zone,  with 
one  of  the  following  main  trajectories  a(c6)a(c7)a(c8), 
fl(c7)fl(c8). 

These  Zones  are  nested  in  each  other.  This  W-Zone  is 
considered  independently  of  the  rest  of  the  elements  of 
Complex  System  which  means  the  interception  of  W- 
BOMBER  is  guaranteed  if  we  consider  the  motions  within 
the  W-Zone,  only.  Two  subsets  of  W-Zone  are  shown  in 
Fig.  10.  A  subset  of  Zones  shown  in  Fig.  10  (left)  represents 
Zones  as  they  are  in  the  initial  state.  The  global  WB- 
Intercept  is  the  subset  of  WB-Intercept^  ^one  sucrr>that 
interception  is  guaranteed  with  respect  to  the  entire  system. 


I  I  1 


Fig.  10.  Sample  Zones  describing  WB-Intercept. 
Consider  BB-Protect.  In  our  problem  the  only  element 
that  can  "potentially"  protect  the  B-BOMBER,  if  necessary, 
and  let  it  hit  the  W-TARGET  is  the  B-FIGHTER.  But  the 
B-FIGHTER  is  involyed  in  WB-Intercept  and  cannot  leave 
it.  Thus,  the  B-BOMBER  can  accomplish  its  mission  safely 
without  protection  only  if  it  does  not  need  one.  This  means 
that  the  Zone  of  B-BOMBER,  the  B-Zone,  should  be  free 
of  intercepting  elements.  Consider  the  set  of  all  the  spates 
where  the  B-BOMBER  is  alone  without  any  1st  negation 
trajectories.  This  is  our  local  BB-Protectg  ^one" 
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The  global  BB-Protect  is  the  subset  of  BB-Protectg_ 
Zone  suc^  l^at  Protecti°n  is  guaranteed  with  respect  to  the 
entire  system. 

Finally,  let  us  consider  WB-Protect.  In  our  problem 
the  only  element  that  can  potentially  protect  W-BOMBER 
is  the  W-FIGHTER.  Consider  the  set  of  all  states  where  W- 
FIGHTER  is  in  the  Zone  of  W-BOMBER.  Three  possible 
locations,  the  gateways,  are  shown  in  Fig.  12  (left).  The 
gateways  are  shown  as  multiple  locations  of  the  W- 
FIGHTER,  the  locations  through  which  the  W-FIGHTER 
can  enter  the  Zone  employing  the  shortest  path  leading  from 
h8.  Once  in  the  Zone  (through  one  of  the  gateways)  the  W- 
FIGHTER  is  able  to  protect  the  W-BOMBER  from  being 
intercepted.  The  set  of  all  the  states  where  W-BOMBER  is 
protected  by  W-FIGHTER  from  d6,  d7,  or  d8  is  included 
into  WB-Protectw.Zone.  Of  course,  another  way  to  protect 

W-BOMBER  is  for  B-FIGHTER  to  leave  this  Zone.  One  of 
such  states  is  shown  in  Fig.  12  (right).  Once  leaving  the 
Zone  B-FIGHTER  would  not  have  chance  to  enter  it  and 
intercept  the  W-BOMBER.  All  such  states  are  also  included 
into  the  WB-Protectw_Zone.  The  global  WB-Protect  is  the 

subset  of  WB-Protectw_Zone  such  that  protection  is 

guaranteed  with  respect  to  the  entire  system. 

BB-Protectg_Zone  and  BB-InterceptBZone  represent 

different  subsets  of  states  of  the  same  set,  the  B-Zone. 
B-Zone  =  BB-InterceptB.Zone  u  BB-ProtectB.Zone 

A  similar  statement  is  true  for  the  Zone  of  W-BOMBER, 
the  W-Zone: 

W-Zone  =WB-InterceptW  Zone  u  WB-ProtectWZone. 


4- 

Fig.  12.  The  sample  Zones  describing  WB-Protect. 


9.  4  Terminal  States  of  the  Subtree 

Now  we  can  evaluate  all  the  terminal  states  of  the  bold 
subtree  (Fig.  5,  13). 

In  particular,  the  terminal  states  of  the  following  variants: 

1 .  h8-g7  a6-b6;  2.  g7-f6  b6:c6. 

1.  h8-g7  a6-b6;  2.  g7-f6  h5-h4;  3.  f6-e5  b6:c6. 

1.  h8-g7  h5-h4;  2.  g7-f6  a6-b6;  3.  f6-e5  b6:c6. 

belong  to  global 

Intercept  =  BB-Intercept  n  WB-Intercept. 
The  terminal  states  of  the  following  variants: 
1.  h8-g7  a6-b6;  2.  g7-f6  h5-h4;  3.  f6-e5  h4-h3; 
4.  e5-d6. 

1.  h8-g7  h5-h4;  2.  g7-f6  a6-b6;  3.  f6-e5  h4-h3; 
4. e5-d6. 

1.  h8-g7  h5-h4;  2.  g7-f6  h4-h3;  3.  f6-e7. 


belong  to  global 

Protect  =  BB-Protect  n  WB-Protect. 

Now  we  conclude  that  all  the  terminal  states  of  the  bold 
subtree  belong  to 

Intercept  u  Protect  c  DrawExpand. 

Thus,  statement  1 .  is  proved. 


full 

search 
tree 


Destroyed 


Safe 


Fig.  13.  Terminal  states  of  the  bold  subtree. 
Let  us  prove  statement  2.  of  the  Theorem  that  the  bold 
subtree  is  optimal  (with  respect  to  the  full  search  tree). 


9.  5  A  Zone  Status  Change 

As  we  know  from  the  above 
B-Zone  =  BB-InterceptB.Zone  u  BB-ProtectB.Zone, 

W-Zone  =  WB-InterceptW  Zone  u  WB-ProtectWZone. 

If  we  consider  Zones  independently  of  the  rest  of  the 
elements  following  definition  the  expanded  terminal  sets, 
every  Zone  can  not  switch  from  one  terminal  set  to  another. 
For  example,  if  the  current  state  of  the  B-Zone  belongs  to 
BB-Intercept  and  we  continue  search  following  the  optimal 
branch,  the  B-Zone  will  never  switch  to  the  state  from  BB- 


Protect,  it  will  reach  BB-Destroyed. 


4- 

— <  - 

0 

Fig.  14.  Interpretation  of  B-Zone  and  W-Zone 
and  their  status  change. 
The  initial  state  of  the  B-Zone  belongs  to  BB-ProtectB_ 

Zone'  f°r  tne  W-Zone  it  belongs  to  WB-InterceptWZone. 

In  the  real  search  every  element  can  move  including  the 
elements  that  are  currently  outside  the  Zone.  This  might 
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cause  the  Zone  to  switch  from  one  status  to  another.  In 
particular,  if  W-FIGHTER  comes  to  one  of  the  gateways  f6, 
g6,  or  h6  the  current  state  of  the  B-Zone  will  switch  from 
the  initial  BB-ProtectB.Zone(Fig.  14,  left)  to  B  B  - 

Interceptg^one 

(Fig.  14,  right). 

If  no  other  elements  can  interfere  and  White  has 
enough  time  for  interception,  then  the  new  status  cannot  be 
changed,  this  state  actually  belongs  to  the  global  BB- 
Intercept.  In  case  of  the  W-Zone,  if  the  W-FIGHTER 
arrives  at  one  of  the  gateways  e6,  e7,  or  e8,  this  Zone 
switches  from  the  initial  local  WB-Intercept-yy  ^one 

(Fig. 

14,  left)  to  the  global  WB-Protect  (Fig.  14,  right).  This 
status  is  final  because  there  are  no  elements  in  the  Complex 
System  to  interfere. 


9.  6  A  Description  of  Winning  Strategies 

Now,  following  the  definition  of  terminal  sets  we  can 
give  a  complete  description  of  possible  W-Win,  B-Win, 
and  Draw  strategies.  Obviously,  in  reality,  only  one  of 
them  takes  place. 

1.  W-Win  strategy: 

W-Win  =  BB-Destroyed  n  WB-Safe 
The  W-Win  strategy,  if  it  exists,  is  to  change  status  of  both 
W-Zone  and  B-Zone  from  the  initial,  WB-Interceptw_ 

Zone  anc*  ^^"^ro*ec^B-Zone'  t0  ^e  glorj>al  WB-Protect 
and  BB-Intercept,  respectively. 

To  do  that  we  have  to  move  W-FIGHTER  from  h8  into 
both  Zones  as  fast  as  possible.  The  presence  of  W- 
FIGHTER  in  these  Zones  immediately  converts  their  status 
into  WB-Protect  and  BB-Intercept,  respectively,  and  this 
status  stays  permanently  in  the  search  tree  until  Zones 
shrink  to  the  states  where  the  W-BOMBER  is  destroyed, 
BB-Destroyed  state  for  the  W-Zone,  or  B -TARGET  is  hit, 
BB-Safe  state  for  the  B-Zone. 

2.  B-Win  strategy: 

B-Win  =  WB-Destroyed  n  BB-Safe 
The  B-Win  strategy,  if  it  exists,  is  to  keep  the  status  of  both 
W-Zone  and  B-Zone  unchanged  as  they  are  in  the  initial 

state,  WB-Intercept^y  ^one  anc*  ^  B-Zone' 
respectively,  which  means  to  reach  the  state  where  the 
status  is  global,  and  belongs  to  WB-Intercept  and  j$B- 
Protect. 

To  do  that  Black  has  to  destroy  W-BOMBER  and  hit 
the  B -TARGET  as  fast  as  possible  by  shrinking  these  Zones 
down  to  WB-Destroyed  and  BB-Safe  states. 

3.  Draw  strategy: 

Draw  =  Safe  u  Destroyed 
where  Destroyed  =  BB-Destroyed  n  WB-Destroyed, 

Safe  =  BB-Safe  n  WB-Safe 
Thus,  the  Draw  strategy,  if  it  exists,  is  to  change  the  status 
of  at  least  one  of  the  Zones,  W-Zone  or  B-Zone,  from  the 
initial,  WB-Intercept^y_Zone  and  BB-Protectg  ^one'  to 
the  global  WB-Protect  and  BB-Intercept,  respectively. 

To  do  that  we  have  to  move  W-FIGHTER  from  h8  into 
both  Zones  as  fast  as  possible.  The  presence  of  W- 
FIGHTER  in  these  Zones  immediately  converts  their  status 


into  WB-Protect  and  BB-Intercept,  respectively,  and  this 
status  stays  permanently  in  the  search  tree  until  Zones 
shrink  to  the  states  where  the  W-BOMBER  is  destroyed, 
BB-Destroyed  state  for  the  W-Zone,  or  B -TARGET  is  hit, 
BB-Safe  state  for  the  B-Zone. 


9.  7  A  Strategy  at  the  Start  State 

Obviously,  we  do  not  know  in  advance  which  strategy 
actually  takes  place  in  this  problem.  Let  us  consider  the 
start  state  for  this  problem  and  assume  that  White  follows 
W-Win  strategy  while.  Black  follows  B-Win  strategy. 
Following  W-Win  strategy  W-FIGHTER  should  get  into 
both  Zones,  the  W-Zone  and  the  B-Zone.  The  distance 
from  the  start  state  (which  belongs  to  BB-Protectg_Zone)  to 

the  set  of  states  BB-Intercept  can  be  measured  as  the 
length  of  the  shortest  trajectory  of  the  W-FIGHTER  from 
h8  to  one  of  the  B-Zone  gateways.  It  is  equal  to  2  steps. 
Analogously,  the  distance  from  the  start  state  to  the  WB- 
Protect  is  equal  to  the  length  of  the  shortest  trajectory  of 
the  W-FIGHTER  from  h8  to  one  of  the  W-Zone  gateways. 
It  is  equal  to  4  steps.  The  total  is  6.  However,  the  distance 
from  the  initial  state  to  the  intersection  BB-Intercept  n 
WB-Protect  can  be  reduced  down  to  4  steps  employing  the 
time-gaining  trajectory  through  g7  and  f6  (Fig.  15,  left). 
Also,  this  is  the  only  way  to  approach  both  sets 
simultaneously.  Thus,  following  the  W-Win  strategy  W- 
FIGHTER  moves  along  the  shortest  trajectory  passing 
through  g7:  1.  h8-g7.  Next,  the  Black,  following  the  B-Win 
strategy,  should  keep  the  status  of  W-Zone  and  B-Zone 
unchanged  as  it  is  in  the  start  state,  WB-Interceptyy_2;one 

and  BB-Protectg  ^one'  resPeclively-  This  means  that  Black 
should  shrink  these  Zones  by  moving  either  the  B- 
FIGHTER  from  a6  along  one  of  the  intercepting  trajectories 
to  intercept  the  W-BOMBER  within  W-Zone  (Fig.  15, 
right)  or  the  B-BOMBER  from  h5  along  the  main  trajectory 
a(h5)a(4)a(h3)a(h2)fl(hl)  to  hit  the  W-TARGET  within  the 


B-Zone 


Fig.  15.  The  distance  measurement  to  BB-Intercept  n  WB- 
Protect  from  the  start  state  (left)  and  from  the  state 
generated  after  1 .  h8-g7  a6-b6  (right). 
If  Black  is  involved  in  the  W-Zone  and  responds,  e.g., 
1.  ...  a6-b6,  the  W-FIGHTER  should  continue  motion  along 
the  same  shortest  trajectory  (Fig.  15,  right):  2.  g7-f6 
following  the  W-Win  strategy. 

Alternatively,  if  Black  is  involved  in  the  B-Zone  and 
moves  1.  ...  h5-h4,  the  B-Zone  shrinks  (Fig.  16).  The  new 
set  of  the  shortest  time-gaining  trajectories  from  g7  through 
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f6  to  both  W-Zone  and  B-Zone  is  shown  in  Fig.  16.  Thus, 
following  W-Win  strategy  the  response  of  the  White  must 
be  the  same:  2.  g7-f6.  Consequently,  the  white  moves  (in 
bold)  1.  h8-g7  X-X;  2.  g7-f6  are  universal,  i.e.,  they  must 
be  included  in  the  optimal  subtree,  the  W-Win  strategy. 
These  moves  even  do  not  depend  of  the  black  first  move,  1 . 
...  X-X,  whatever  it  is,  even  if  Black  follows  the  B-Win 
strategy. 


is 

Fig.  16.  The  distance  measurement  from  the  state  after 
1.  h8-g7  h5-h4  to  BB-Intercept  n  WB-Protect. 
According  to  this  strategy  the  Black  should  keep  the  status 
of  the  W-Zone  and  B-Zone  unchanged.  Let  us  investigate 
three  possible  alternatives  for  the  Black  response  2.  ...  Y-Y, 
a),  b),  and  c). 

a)  If  Black  is  involved  in  the  W-Zone  on  the  first  and 
second  moves,  i.e.,  if  1.  h8-g7  X-X;  2.  g7-f6  Y-Y  and 
X-X  and  Y-Y  are  both  in  the  W-Zone,  then  W- 
FIGHTER  at  f6  is  immediately  getting  into  B-Zone 
while  the  B-Zone  itself  is  changing  its  status  to  BB- 
Interceptg_20ne-  This  happens  because  of  the 
alternation  of  turns:  while  making  the  Y-Y  move, 
Black  spends  one  time  increment  in  the  W-Zone  that 
gives  an  extra  time  increment  to  the  White  in  the  B- 
Zone,  the  white  turn,  and  the  intercepting  trajectories 
from  f6  become  the  1st  negation  trajectories  (Fig.  15, 
right)  -  see  Section  5.  Thus,  if  White  follows  the  W- 
Win  strategy  and  Black  -  the  B-Win  strategy,  we 
ended  up  with  the  change  of  the  status  of  one  of  the 
Zones,  the  B-Zone,  and  this  status  will  stay  unchanged 
for  the  rest  of  the  search  which  means  that  both  W- 
BOMBER  and  W-FIGHTER  will  be  eventually 
destroyed.  The  state  after  2.  ...Y-Y  is  a  terminal  state 
that  belongs  to  global  WB-Intercept  n  BB-Intercept. 
Consequently,  we  actually  implemented  the  Draw 
strategy, 

Draw  =  Safe  u  Destroyed 

for  the  case  of 
Destroyed  =  BB-Destroyed  n  WB-Destroyed. 

The  corresponding  branches  are  included  in  the  bold 
subtree  (Fig.  5) 

b)  If  Black  is  involved  in  the  B-Zone  on  the  first  and 
second  moves,  i.e.  X-X  and  Y-Y  are  1.  ...  h5-h4  and  2. 
...  h4-h3,  respectively,  then  the  W-FIGHTER  should 
move  towards  W-Zone  gateways  d6,  d7,  and  d8  along 
the  most  time-gaining  trajectory  through  e7  (Fig.  16): 
3.  f6-e7.  This  is  the  terminal  state  which  belongs  to 


WB-Protect  n  BB-Protect,  i.e.,  we  changed  status  the 
W-Zone.  As  in  the  case  a),  if  White  follows  the  W- 
Win  strategy  and  Black  -  the  B-Win  strategy,  we  end 
up  with  the  Draw  strategy 

Draw  =  Safe  u  Destroyed 
for  the  case  of 

Safe  =  BB-Safe  n  WB-Safe. 
The  corresponding  branches  are  included  in  the  bold 
subtree  (Fig.  5). 

c)    If  Black  is  involved  in  the  W-Zone  on  the  first  move  1 . 

...  X-X  and  switches  to  B-Zone  on  the  second  move  2. 

...  h5-h4,  then  W-FIGHTER  should  continue  moving 

along  the  shortest  time-gaining  trajectory  3.  f6-e5. 

There  are  two  alternatives. 
If  Black  switches  back  to  the  W-Zone,  then  W- 
FIGHTER  at  e5  is  immediately  getting  into  B-Zone 
while  the  B-Zone  itself  is  changing  its  status  to  BB- 
Intercept.  The  situation  in  this  state  is  similar  to  the 
case  a):  this  is  the  terminal  state  which  belongs  to 
WB-Intercept  n  BB-Intercept,  i.e.,  we  changed 
status  the  B-Zone.  Thus,  if  White  follow  the  W-Win 
strategy  and  Black  follow  the  B-Win  strategy, 
again,  we  end  up  with  the  Draw  strategy 

Draw  =  Safe  u  Destroyed 
for  the  case  of 

Destroyed  =  BB-Destroyed  n  WB-Destroyed. 
Otherwise,  if  Black  continues  in  the  B-Zone  3.  ...  h4- 
h3,  then  W-FIGHTER  at  e5  should  immediately  get 
into  the  W-Zone  through  d6  gateway  while  the  W- 
Zone  itself  is  changing  its  status  to  WB-Protect.  The 
situation  in  this  state  is  similar  to  the  case  b):  this  is 
the  terminal  state  which  belongs  to  WB-Protect  n 
BB-Protect,  i.e.,  we  changed  status  the  W-Zone. 
Thus,  finally,  if  White  follows  the  W-Win  strategy 
and  Black  follows  the  B-Win  strategy,  we  end  up 
with  the  Draw  strategy, 

Draw  -  Safe  u  Destroyed 
for  the  case  of 

Safe  =  BB-Protect  n  WB-Protect. 

In  case  c)  the  corresponding  branches  are  included  in 

the  bold  subtree  (Fig.  5). 

We  proved  that  the  only  optimal  strategy  possible  at  the 
start  state  is  the  Draw  strategy.  The  bold  subtree  is  an 
implementation  of  this  strategy.  Thus,  statement  2.  and 
Theorem  are  proved. 


10.  Discussion 

Theoretical  results  received  employing  other 
approaches  and  experiments  with  programs  utilizing  these 
approaches  show  that  the  search  tree  generated  in  order  to 
solve  the  problem  considered  in  Sections  7-9  consists  of 
more  than  a  MILLION  moves  with  the  branching  factor 
(Nilsson,  1980)  around  9.  In  contrast,  the  Linguistic 
Geometry  tools  allowed  to  find  the  optimal  solution 
generating  the  search  tree  which  consists  of  about  46 
moves,  with  the  branching  factor  1.5.  The  low  branching 
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factor  indicates  that  the  algorithm  is  goal-oriented. 

The  novelty  of  this  paper  is  in  the  proof  of  optimality 
of  the  solution.  It  appears  that  LG  tools  are  able  to 
distinguish  and  significantly  expand  the  "islands  of 
potential  stability",  the  sets  of  states  (positions)  with 
known  value  in  the  "ocean  of  all  states  of  unknown 
value"  to  be  searched  employing  conventional  brute  force 
approach.  In  our  case  we  expanded  the  small  islands  of 
terminal  states,  Destroyed  and  Safe,  into  the  sets  Intercept 
and  Protect.  Moreover,  I  envision  the  LG  search  like  an 
"optimal  navigation  of  the  ship"  from  the  start  state 
through  the  ocean  of  unknown  states  to  the  expanded 
islands  employing  the  shortest  path.  In  our  problem  this 
was  reflected  as  a  motion  of  the  W-FIGHTER  towards  W- 
Zone  and  B-Zone  simultaneously.  It  is  likely  that  similar 
ideas  work  in  all  the  LG  examples.  This  will  be  a  subject  of 
our  further  research.  If  this  is  the  case  we  can  speculate  that 
LG  tools  allow  for  a  very  efficient  break  of  the  state  space 
that  drives  the  search  directly  to  the  optimum. 

It  is  easy  to  predict  that  the  power  of  the  Linguistic 
Geometry  goes  beyond  the  domain  of  aerospace  games.  The 
definition  of  Complex  System  (see  Section  3)  is  generic 
enough  to  cover  a  variety  of  different  problem  domains. 
The  core  component  of  this  definition  is  the  triple  X,  P,  and 
Rp.  Thus,  looking  at  the  new  problem  domain  we  have  to 

define  X,  the  finite  set  of  points  -  locations  of  elements.  In 
different  real  world  problems  we  can  consider  X  as  an 
operational  district  of  the  underwater  or  on-surface  combat, 
or  even  as  a  set  of  orbits.  The  set  of  elements  P,  the  mobile 
units,  can  be  the  set  of  submarines,  tanks,  squadrons,  or 
satellites  with  various  moving  capabilities.  Indeed,  these 
capabilities  are  represented  with  the  binary  relations  Rp, 

which  is  exactly  the  place  for  introduction  of  the  variable 
speed,  the  gravity  impact,  the  engine  impulse  duration,  etc. 

A  dramatic  search  reduction  achieved  in  the  serial  and 
concurrent  games  (Stilman,  1994-1996)  allowed  us  to 
initiate  the  development  of  the  prototype  of  the  system  for 
optimal  planning  and  control  of  the  real  world  aerospace 
combat  with  participation  of  air  fighters,  satellites,  and 
unmanned  aerial  vehicles  -  UAVs.  This  work  is  currently 
under  way  at  Phillips  Lab,  Kirtland  AFB,  NM,  USA.  Also, 
a  prototype  of  the  generic  Hierarchy  of  Formal  Grammars, 
a  test-bed  of  the  multiagent  architecture  for  various 
applications,  is  planned  to  be  developed  at  Sandia  National 
Laboratories,  Albuquerque,  NM. 
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XXI  ISAS'96:  CONFERENCE  REVIEW 


IS  AS' 96:  Conference  Review 


prepared  by  A.  Meystel,  K.  Bellman,  D.  Filev,  J.  Goguen,  C.  Hewitt,  C.  Joslyn, 
L.  Kohout,  M.  Kokar,  C.  Landauer,  I.  Muchnik,  L.  Perlovsky,  V.  Stefanuk,  Y.  Yufuk 

This  Review  is  based  upon  reports  submitted  by  the  workshop's  chairmen.  The  complete 
set  of  reports  for  ISAS'96  and  ISAS'97  will  be  combined  into  a  white  paper  entitled  "Semiotics: 
Its  Significance  and  Perspective".  This  will  be  distributed  among  the  organizations  involved  in 
monitoring  and  funding  the  research  and  educational  activities  in  the  US. 


1.  SYNTACTICS  OF  INTELLIGENT  SYSTEMS: 
THE  KINDS  OF  LOGIC  AVAILABLE 

•  The  main  theme  of  the  Syntactics  workshop  was  how  to  deal  with  infinite/complex  structures 
in  modeling  systems  within  the  framework  of  logic. 

•  The  main  reason  to  look  at  systems  from  the  semiotic  perspective  is  the  COMPLEXITY  of 
both  real  and  artificial  systems  (cf.  the  comment  by  Professor  Sebeok  on  the  existence  of 
infinite  number  of  possible  signs  generated  by  a  finite  number  of  primitives,  e.g.,  genetics, 
immune  system.) 

•  The  workshop  focused  on  the  available  tools.  The  pervasive  question  was,  how  can  we 
feasibly  use  these  tools? 

•  We  generally  concluded  that  various  system  engineering  techniques  combined  within  a 
framework  of  logic  lead  to  significant  improvement,  by  using  semiotic  tools  for  handle  the 
complexity  of  both  real  and  artificial  large-scale  systems. 

•  The  following  approaches  have  been  discussed: 

1.  Moshe  Vardi  stated  that  in  order  to  establish  communication  between  (or  among)  agents, 
they  need  common  (shared)  knowledge.  Establishing  such  a  knowledge  base  within  the 
logical  framework  may  lead  to  an  infinite  number  of  message  exchanges.  M. Vardi 
proposed  two  approaches  to  establish  such  common  knowledge  in  a  finite  number  of 
steps. 

2.  Wlodek  Zadrozny  proposed  an  algebra  to  represent  ten  categories  of  signs  in  Peirce's 
classification.  Although,  it  seems  that  this  algebra  should  be  an  infinite  one.  Zadrozny 
showed  that  it  can  be  finite. 

3.  Mieczyslaw  Kokar  showed  how  to  combine  many  methods  from  system  engineering 
within  a  logical  framework  to  improve  system  performance.  The  main  point  is  that  the 
issues  of  complexity  of  logical  tools  can  be  remedied  by  such  system  engineering  tools. 
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The  question  is  how  far  can  we  go  with  developing  such  heterogeneous  systems  into 
autonomous  ones. 

4.  Sergey  Petrov  analyzed  languages  that  are  "poor"  in  the  sense  that  some  deductions  in 
these  languages  are  not  possible.  His  question  was  if  we  can  decide  whether  or  not  there 
are  finite  axiomatization  for  some  languages. 

5.  Jery  Tomasik  showed  that  classical  results  of  Rasiowa  and  Keisler  can  be  used  to 
construct  models  of  the  world  based  upon  inputs  from  sensors.  Sensors  can  provide 
infinite  number  of  input  combinations.  Tomasik  showed  how  to  address  infinity  within 
the  framework  of  model  theory. 

6.  William  Farmer  showed  how  an  existing  interactive  theorem  proving  system  (IMPS) 
changes  contexts  via  "theory  interpretation."  IMPS  keeps  a  base  of  "little  theories" 
which  are  examined  by  the  system  for  their  use  in  a  specific  problem. 


2.  MULTIRESOLUTIONAL  CONCEPTS  AND  METHODOLOGIES 

•  Multiresolutional  approach  is  a  powerful  tool  of  complexity  reduction.  Introduction  of 
multiple  resolution  levels  has  actually  determined  the  architecture  of  all  existing  systems 
including  architecture  of  the  brain.  Multiresolutional  methods  can  be  applied  to  the 
following:  time,  space,  descriptions  and  relations  of  entities,  events,  problems  and  plans. 

•  Advantages:  MR-methods  allow  us  to  represent  the  world  at  many  levels  of  resolution  due  to 
the  power  of  generalization  algorithms.  This  creates  a  limitation  on  analysis  from  above  and 
from  below.  Total  complexity  of  computations  can  be  drastically  reduced. 

•  Engineering  methodologies  of  MR-methods  include  the  following:  object  oriented 
programming,  fractals,  MR-signal  processing,  Wavelets,  multirate  control  systems, 
multiscale  and  multigranular  representations,  and  others 

•  Standards  of  modules  and  interfaces  can  be  created  based  upon  properly  introducing 
definitions  for  granularity,  zone  of  indistinguishability,  scope,  scale,  and  cost-functional  of 
interest. 

•  Disadvantages:  rigidness  of  hierarchy  seems  to  be  the  most  pervasive  complaint. 
Frequently,  this  complaint  is  linked  with  neglect  of  the  overall  computational  advantages. 
However,  we  should  work  on  developing  the  trade-off  methodologies  and  analysis  of  the 
existing  restrictions.  One  of  the  ways  to  avoid  the  disadvantages  is  by  adding  adaptation 
capabilities  to  MR-systems 

Participants  of  the  workshop  concentrated  on  three  major  themes:  MR-representation,  MR- 
procedures,  and  MR-models. 

I.        MR-REPRESENTATION  IN  DIFFERENT  DOMAINS 

The  development  of  this  is  based  on  studying  multiple  results  of  practical  applications  for 
examples  in  various  areas. 
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Example  1:  In  all  problems  of  image  processing,  we  deal  with  a  hierarchy  of  pixels  -  units  of 
granularity  ("atoms"  or  indistinguishability  zones  for  the  images.)  Three  levels  of  a  pixel 
characterization: 

•  a  component  of  the  pixel  (a  successor  or  a  "child")  which  can  be  determined  for  the 
pixel  at  the  lower  resolution  level  (its  "father") 

•  the  level  quantitative  characterization  of  the  image  elementary  components  (the 
diameter  of  the  indistinguishability  zone.) 

•  the  description  of  the  inclusion  relationships  among  the  pixel  and  pixel's  children. 
Example  2:  Recognition  systems  for  the  handwriting: 

•  the  indistinguishability  zones  or  elementary  entities  are  irregular  structures;  MR- 
method  allows  for  the  existence  of  irregular  elements  of  contour  images  and  models 
of  trajectories.  Example  (graphs  of  interactions  within  large  molecules  in 
biochemistry) 

Example  3.  Elementary  units  of  different  levels  of  resolution  in  various  domains: 

•  atoms  in  physics 

•  scale  units  in  cartography 

•  Lyapunov  ball  control  theory 

•  radical  groups  of  molecular  structures  in  chemistry 

•  sub-molecules  (amino  acids  in  proteins)  in  chemistry 

•  hyper  units  (monomer  proteins  in  multi-domain  proteins)  in  chemistry 

•  coalitions  of  protein  reloaded  with  particular  biological  functions  in  biochemistry 

•  irregular  elementary  entities  (iee)  from  the  low  level  of  resolution  are  integrated  in  a 
trajectories  between  iee  for  higher  resolution  in  handwriting  analysis 

Example  4.  Hidden  Markov  process  in  the  area  of  signal  processing 

1st  Level  of  resolution:  elements  should  be  vectors  parameters  for  observed  component 
of  the  hidden  process  and  names  of  its  hidden  states: 

2nd  Level  -  elements  of  structural  parameters  of  the  teams  of  the  first  level  elements. 

Example  5.  Complex  systems  analysis:  elementary  entity  at  a  level  characteristic  for  the  object- 
oriented  analysis. 

II.  MR-PROCEDURES 

1.  Analysis  at  a  single  level: 

•  aggregation  and  decomposition  of  elements  for  a  single  level  of  resolution 

•  characterization  of  their  uniqueness 

•  determining  their  relationship 

2.  Analysis  of  the  complete  structure  of  the  multiresolutional  architecture: 
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•  a  priori  design  of  the  multiresolutional  architecture  which  delivers  the  minimum  of 
computational  complexity  to  a  system 

•  development  of  mathematical  models  for  systems  belonging  to  a  particular  resolution 
level 

•  development  of  relations  among  mathematical  models  of  the  same  system  at  different 
levels  of  resolution 

•  switch  on  and  off  procedures  of  the  single  level  analysis 

•  generalization  of  results  of  the  single  level  analysis 

•  evaluation  of  correlations  of  these  results  with  final  outcome  and  designing  the  final 
rules  to  get  the  outcome 

•  changing  the  current  multiresolution  structure  of  analysis  and  iterating  it  in  a  new 
form 

•  solution  recognition  (global  stop  rule) 

•  cooperation  with  a  human  assistant 

III       Mathematical  Models  of  Multiresolutional  Systems  (MRS) 

•  generators  of  hierarchical  structures 

•  hierarchical  structure  algebra 

•  dynamic  models  onto  hierarchical  structures  (new  automata  theory  and) 

•  Methods  of  Simulation  of  MRS 

•  Mathematical  criteria  for: 

*  aggregation  -  decomposition  of  assemblies 

*  abstraction  -  specialization  of  features 

*  generalization  -  insinuation  of  features 

IV.      Efficiency  Evaluation  for  the  MRS  Approach: 

Systematic  collection  is  required  of  the  benefits  and  shortcomings  of  MRS  used  for  Intelligent 
System  Development. 

3.  FUZZY  LOGIC  AND  THE  MECHANISMS  OF  GENERALIZATION 

•  The  main  goal  of  this  workshop  is  to  work  toward  establishing  a  link  between  some  methods 
and  concepts  from  semiotics  and  fuzzy  logic  and  system  fields. 

•  Fuzzy  logic  methodology  enriches  the  logic  field  with  methods  for  handling  linguistic 
statements  in  terms  of  linguistic  variables  introduced  by  Zadeh.  Fuzzy  logic  can  also  deal 
with  approximation  and  incompleteness  issues. 

•  Semiotics,  on  the  other  hand,  provides  methods  to  deal  with  the  meaning  of  linguistic  and 
logic  statements,  paying  attention  to  their  syntax,  semantics,  and  pragmatics. 

•  Deeper  integration  of  these  two  approaches  with  the  aim  to  strengthen  their  applicability  in 
design,  analysis,  and  use  of  Intelligent  Systems  is  required. 
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•  The  following  issues  are  discussed: 

1.  Granularity,  Generalization,  and  Meaning 

•  The  main  goal  of  semiosis  is  the  interpretation  of  signs  and  symbols  which  can  be  understood 
also  as  recognition  of  their  meaning.  The  meaning  interpretation  unit  is  a  very  complicated 
one.  The  meaning  of  something  can  be  different,  depending  on  the  scale  which  is  used  for 
representation.  This  makes  the  notion  of  granularity  a  very  important  one.  Levels  of 
granularity  are  also  referred  to  as  levels  of  resolution,  and  they  are  closely  related  to 
generalization. 

•  Granularity  was  introduced  into  fuzzy  sets  by  Zadeh.  In  his  interpretation,  "granulation 
involves  a  decomposition  of  the  whole  into  parts,  and  conversely  organization  involves  an 
integration  of  the  parts  into  whole." 

•  Intelligent  systems  deal  with  interpreting  information  communicated  in  symbolic  form. 
Interpretation  and  meaning  depend  on  context.  Any  dynamic  process  considered  with  all  its 
details  and  the  details  of  its  interaction  with  its  environment  cannot  be  properly  understood 
unless  the  details,  which  are  irrelevant  within  a  particular  context,  are  removed.  But,  the 
information  on  which  such  an  interpretation  could  be  based  will  often  be  incomplete.  With 
incompleteness,  there  may  be  more  that  one  interpretation.  Indeed,  a  large  family  of 
interpretations  may  be  possible,  which  may  conflict.  Fuzzy  sets,  relations  and  logic  can  play 
an  important  role  here.  They  allow  us,  through  the  theory  of  potentiality  (or  virtual  plurality), 
to  deal  with  the  whole  family  of  virtual  outcomes  and  also  to  measure  the  degree  of  conflict 
of  individual  members  of  some  possible  family  of  outcomes  produced  by  the  meta-process  of 
interpretation. 

2.  The  Role  of  Granulation  as  and  for  Generalization  in  Cognition  and  Action 

•  Generalization  plays  an  essential  role  in  the  human  cognitive  and  symbolic  activities.  But,  it 
is  equally  important  in  artificial  intelligent  systems.  Generalization  works  as  a  filter  of 
information.  During  the  process  of  generalization,  the  relevant  pieces  of  information  are 
included  in  the  final  outcome,  and  the  irrelevant  ones  are  excluded.  Information  to  be 
included  and/or  excluded  depends  on  the  contexts  and  information  processing  goals. 

•  Architectures  of  intelligent  systems  employ  the  concept  of  granularity.  The  examples  of 
multigranular  (multiscale,  multiresolutional)  systems  are  known  in  robotics  and 
manufacturing  (e.g.  NIST-RCS.) 

•  Some  advanced  intelligent  system  architectures  mimic  biological  models  of  the  human  brain. 
Such  architectures  operate  as  the  coupling  of  functional  hierarchical  (or  heterarchical)  levels 
and  loops.  Each  level  operates  at  a  different  level  of  generality,  abstracting  many  different 
features  from  the  world  with  which  the  architecture  interacts.  Granulation  and  generalization 
are  closely  interlinked. 

•  In  computer  science,  the  problem  of  structuring  systems  is  commonly  addressed  via  the 
Object-Oriented  (00)  approach.  This  is  a  set  of  important  design  techniques  for  structuring 
systems  in  a  hierarchical  fashion.  There  are,  however,  some  conceptual  and  practical 
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problems  with  00  programming  when  dealing  with  highly  dynamic  systems.  Like  data  base 
entities,  objects  are  usually  assumed  to  be  given.  This,  however,  hypothesizes  hierarchies, 
with  no  accounting  for  the  dynamic  binding  of  levels  of  individual  objects. 

•  To  deal  with  this  problem  we  have  to  examine  the  essence  of  inheritance  and  investigate  how 
the  objects  emerge.  The  phenomenon  of  object  emergence  is  linked  with  formation  of  crisp 
and  fuzzy  classes  (generalized  groupings).  Indeed,  we  deal  with  the  logical  theory  of  fuzzy 
relations,  which  can  be  used  to  expose  the  inadequacy  of  currently  used  logical  structure  of 
crisp  (i.e.  non-fuzzy  objects).  Logically,  generalization  is  a  process  in  which  relevant 
properties  (intentional  specifications)  of  objects  are  grouped.  We  create  new  objects  — 
structures  (given  as  such  by  extension)  by  new  intentional  specification.  So  00  approach 
can  be  viewed  as  a  very  special  case  of  the  pragmatics  of  groupings,  a  kind  of  granulation. 

•  The  issues  of  granulation  in  their  full  generality,  however,  must  be  adequately  addressed  by 
employing  fuzzy  classes,  the  semantics  of  which  can  be  provided  by  many-valued  logic 
based  relations  with  special  meta-properties.  The  issues  of  adequate  multi-context  semantics 
that  is  needed  for  the  Intelligent  Systems  require  this  generality.  The  00  approach  has  to  be 
extended. 

3.  The  Issues  of  Incompleteness  of  Information 

•  In  Intelligent  Systems  we  must  deal  with  incompleteness  of  information.  This  is  the  domain 
where  fuzzy  sets  can  help.  The  context  in  which  objects  operate  determines  the  goals  and 
meaning  of  the  objects  and  their  generalization  hierarchical  structure.  Without  the  semiotic 
notions,  this  aspect  cannot  be  satisfactorily  handled.  We  need  a  synergistic  development  of 
fuzzy  and  semiotic  methodologies.  One  of  the  classic  notions  of  semiotics  is  the  triadic 
structure  of  the  object,  its  notation  and  its  interpretation. 

•  The  concept  of  the  semiotic  triplet  of  subdisciplines  syntax  ~  semantics  —  pragmatics  was 
coined  by  Morris  in  the  late  1930s.  That  is  where  the  duality  Semiotics  —  Mathematics 
comes  in.  In  logical  methods  of  proofs,  only  the  "form,"  the  syntactic  composition  plays  the 
role.  In  the  logical  theory  of  models,  the  primary  goal  is  to  interpret  syntax  in  semantic 
meta-structures  and  the  pragmatics  of  emergence  of  either  of  these.  Here,  the  fuzzy  logic  can 
play  an  important  role  since  we  have  the  duality  of  linguistic  descriptors  and  fuzzy  structures 
to  which  these  descriptors  apply.  We  also  have  the  duality  symbolic  vs.  numerical  that  are 
both  addressed  by  what  Zadeh  called  "the  fuzzy  logic  in  wider  sense." 

•  When  forming  grouping/generalizations/classes  on  the  basis  of  their  properties,  one 
substantial  difficulty  appears.  One  needs  to  generalize  on  the  basis  of  relevant  properties  of 
objects  that  are  present  as  well  as  on  the  basis  of  the  relevance  of  the  absence  some 
properties.  This  leads  to  entirely  new  problems.  This  problem  which  is  closely  connected 
with  semisets  and  vagueness  is  determined  by  problem  of  lack  of  properties  and  by  the 
locality  of  negation.  These  are  closely  connected  with  semisets. 

Conclusions 

1.  There  are  potential  links  between  fuzzy  logic  and  semiotics  that  should  be  further 
explored.  Granularity  is  essential  to  develop  any  complex  system.  To  deal  with  granularity  in 
intelligent  systems,  the  potential  of  fuzzy  logic  to  approximate  should  be  explored.  Fuzzy 
relations  provide  high-level  specification  language  and  computational  tool  to  form  granules  that 
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subsume  equivalences,  similarities,  and  hierarchies  of  objects.  Semiotic  concepts  and  methods 
can  deal  with  naming,  contexts,  dynamics  of  symbolic  communication,  and  the  pragmatics  of 
naming  and  actions. 

2.  There  is  sufficient  background  to  conduct  further  discussion  and  formulating  an  action 
plan  to  deal  with  the  topic  of  fuzzy  logic,  granulation  an  generalization.  This  should  be  pursued 
via  internet.  The  chair  of  this  workshop  is  willing  to  mediate  further  discussion  and  produce  a 
document  summarizing  these. 


4.  INTELLIGENCE  OF  RECOGNITION 

•  The  question  of  intelligence  of  algorithms  and  neural  networks  is  being  answered  today 
in  at  least  five  ways.  First,  mathematical  models  are  developed  for  psychological 
experimental  data  on  perceptual  and  behavioral  phenomena.  Second,  mathematical  tools  are 
developed  for  modeling  human  perceptual  functions  such  as  vision.  Third,  mathematical 
models  are  developed  for  those  functions  of  intellect,  which  are  associated  with  the  internal 
working  or  mind,  such  as  meaning  and  consciousness.  Fourth,  mathematical  modeling  of 
brain  organization  is  used  to  search  the  understanding  of  mind.  Fifth,  mathematical  and 
metaphysical  analysis  of  intelligence  is  being  undertaken.  This  establishes  connections 
among  various  mathematical  concepts  and  their  relationships  to  metaphysical  concepts  of 
intelligence.  These  directions  are  discussed  in  the  workshop. 

•  Recognition  of  objects  in  images  and  in  temporal  sequences  of  images  is  one  of  the  most 
important  recognition  paradigms.  A  lot  of  information  is  accumulated  about  biological  vision 
systems  and  a  lot  of  effort  has  been  invested  into  developing  mathematical  tools  and 
engineering  applications.  Although  we  are  still  far  from  completing  a  general  mathematical 
theory  of  vision,  significant  progress  has  been  achieved  in  a  number  of  applications  and  a 
number  of  useful  mathematical  tools  has  been  developed.  One  thing  is  clear.  A  vision 
system  is  not  a  homogeneous  one,  but  it  combines  a  large  number  of  diverse  subsystems.  An 
image  recognition  process  is  generally  separated  into  several  steps  or  stages,  both 
mathematically  and  implemented  in  hardware.  A  mathematical  and  engineering  reason  for 
this  is  to  reduce  an  overwhelming  complexity  of  the  problem. 

•  Biological  systems  also  are  known  to  process  visual  information  in  stages,  and  many  of 
our  mathematical  tools  are  patterned  after  biological  vision  systems.  A  typical  breakdown 
includes  detection  of  the  region  of  interest,  segmentation  of  the  objects,  enhancement,  and 
recognition  of  the  object  class  and  pose. 

•  The  following  approaches  have  been  discussed: 

1.  Professor  D.  Casasent  discusses  developing  new  powerful  and  efficient  tools  including 
new  distortion-invariant  filters,  biologically-inspired  Gabor  wavelet  filter  techniques, 
hierarchical  processing,  new  representation  methods.  Casasent  also  discusses  mathematical 
methods  of  combining  (fusing)  these  tools  into  a  unified  vision  system.  Practically  important 
problems  that  still  await  their  solutions  include  distortion-invariant  detection  and  recognition; 
efficiently  processing  of  time-sequential  image  frames;  feature  selection;  and  combining 
results  from  several  algorithms,  and  efficient  representation  techniques. 

2.  Nonlinearity  of  human  visual  processing  was  recognized  in  the  last  century  by  scientists 
including  Helmholtz.  He  found,  for  example,  that  a  perceived  color  of  an  object  depends  on 
the  average  color  of  the  visual  field.  But,  nonlinear  mathematical  methods  that  are  suitable 
for  modeling  vision  and  perception  in  general  did  not  exist  in  the  last  century.  These 
methods  appeared  only  recently.  Resurgence  of  neural  network  research  in  1980s  was  due  to 
emergence  of  powerful  nonlinear  neural  paradigms.  A  neural  network  field  can  be  viewed  as 
a  development  of  nonlinear  mathematical  tools  suitable  for  modeling  the  mind.  But,  many  of 
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the  currently  popular  neural  network  paradigms  are  limited  in  the  basic  type  of  a  non-linear 
operation  performed  by  a  neuron  to  a  nonlinear  transformation  applied  to  a  weighted  sum  of 
neuronal  input  signals.  Professor  G.  Ritter  discusses  a  new  morphological  type  of  neural 
networks,  which  explores  a  different  type  of  nonlinearity  at  the  basic  neuronal  level.  In 
morphological  neurons,  multiplication  and  addition  are  replaced  with  additions  and  max/min 
operations. 

3.  The  concept  of  morphological  neural  network  has  roots  in  image  algebra  and  in  an 
alternative  hypothesis  concerning  properties  of  biological  neurons.  Hierarchical  organization 
is  another  important  aspect  of  intelligent  systems.  Specific  mathematical  techniques 
addressing  hierarchical  organization  has  received  appreciable  attention  in  the  development  of 
wavelet  transforms.  Professor  H.  Szu  combines  wavelet  and  neural  network  techniques  to 
develop  adaptive  wavelet  transforms.  He  discusses  organization  of  human  perceptual 
systems,  the  mathematical  technique,  hardware  implementations  and  applications. 

4.  The  roles  of  learning  vs.  a  priori  knowledge  is  one  of  the  central  and  most  controversial 
problems  in  the  development  of  intelligent  algorithms.  While  symbolic  AI  emphasizes 
apriority  of  knowledge  acquisition,  most  pattern  recognition  and  neural  network  techniques 
emphasize  adaptivity  of  intelligent  systems.  In  Prof.  Minsky's  words,  "...theory  of 
representation  was  developed  independent  from  theory  of  learning."  Model-based 
recognition  paradigm  has  been  developed  to  combine  apriority  of  models  with  adaptivity  of 
model  parameters.  Prof.  E.  Manolakos  discusses  model-based  joint  segmentation  and 
recognition  of  objects.  He  develops  this  technique  in  the  framework  of  parameter  estimation 
of  hierarchical  mixture  densities.  The  maximum  a  posteriori  (MAP)  estimate  of  the 
parameters  is  computed  by  the  application  of  a  modified  version  of  the  Expectation 
Maximization  algorithm  (EM  with  regularizing  constraints  applied  to  multiple  level 
hierarchies).  The  approach  is  flexible.  It  allows  for  non-stationary  pixel  statistics,  different 
noise  models,  translation  and  scale  invariant  and  is  well  suited  for  recognition  of  partially 
occluded  objects.  An  unsolved  problem  that  should  be  addressed  in  the  future  is  the 
development  and  use  of  complicated  structural  models. 

5.  Most  recognition  algorithms  and  neural  networks  can  be  viewed  as  seeking  a  minimum 
value  of  an  appropriate  objective  function  during  recognition  or  learning  phases.  In  which 
way  is  maximization  of  the  goal  function  related  to  intelligence?  In  which  way  does  it  lead 
to  an  appropriate  generalization  from  the  past  to  future  data?  What  type  of  intelligence  could 
be  attributed  to  different  types  of  goal  functions?  Professor  Golden  addresses  these  questions 
by  establishing  a  relationship  of  the  goal  function  maximization  to  other  mathematical 
concepts  of  intelligence.  Specifically,  he  shows  that  the  goal  maximization  is  equivalent  to 
an  inference  procedure  on  a  relational  system  comprised  of  the  data,  event  space,  and  relation 
operation.  This  theory  is  useful  for  practical  applications,  as  well  as  for  providing  insights  in 
formulating  the  sense  in  which  a  given  recognition  or  neural  network  algorithm  is  making 
intelligent  inferences.  Future  research  should  address  establishing  relationships  between 
neural  networks  and  the  statistical  pattern  recognition  framework,  construction  of  statistical 
tests  for  testing  hypotheses  about  the  parameters  of  such  networks,  and  statistical  goodness- 
of-fit  tests  for  deciding  if  a  given  neural  network  algorithm  best  fits  a  given  statistical 
environment. 

6.  Genetic  mechanisms  have  a  fundamental  similarity  with  the  human  Mind  in  producing 
adaptive  systems  based  on  a  priori  structures.  This  property  of  genetic  mechanisms  is  used  to 
develop  improved  recognition  techniques  in  genetic  algorithms  (GA)  and  genetic 
programming  (GP)  techniques.  Mr.  W.  Punch  describes  recognition  techniques  based  on 
genetic  mechanisms  to  find  the  properties  of  data  that  are  important  for  data  classification 
within  large  datasets.  He  combines  GA  with  the  K-nearest  neighbor  (knn)  algorithm.  The 
GA  determines  weights  for  each  feature  based  on  known  examples  so  as  to  optimize  knn- 
classification  based  on  a  linear  combination  of  features.  He  also  describes  an  extension  of 
this  work  to  GP.  The  combined  GP-knn  technique  optimizes  data  classification  by  selecting 
linear  combinations  of  features,  as  well  as  determines  functional  relationships  among  the 
features.  He  also  compares  the  effectiveness  of  GA  and  GP  for  biological  problems. 


574 


What  can  be  learned  about  intelligent  recognition  architectures  from  brain  studies,  and 
what  type  of  brain  organization  is  suggested  by  combining  current  psychological  and 
neurophysiological  data  with  mathematical  models  of  neural  networks?  Prof.  D.  Levine 
discusses  the  mutual  interaction  between  these  fields  and  the  neural  models  of  the  prefrontal 
function.  The  prefrontal  cortex  performs  the  "executive"  function  within  the  brain.  It  is  roughly 
divided  into  three  interacting  parts:  affective  guidance  of  responses,  linkage  among  working 
memories,  and  forming  behavioral  schemata.  Neural  models  of  each  part  is  discussed  within  a 
theoretical  framework. 


5.  INTELLIGENCE  OF  LEARNING  AND  EVOLUTION 

•  What  is  intelligent  about  our  recognition  and  learning  algorithms,  neural  networks,  and 
intelligent  system  architectures?  Which  aspects  of  intelligence  we  understand  and  know  how 
to  use  in  our  algorithms  and  neural  networks,  and  which  do  we  not?  Does  individual 
learning  resemble  evolution  of  species,  and  what  are  the  mechanisms  of  learning  and 
evolution?  These  were  the  questions  posed  to  a  group  of  distinguished  scientists  and  leaders 
in  the  fields  of  recognition,  intelligent  systems,  neural  networks  and  learning  who  attended 
this  workshop. 

•  The  response  was  tremendous;  the  discussion  of  intelligence  in  our  mathematical  models 
of  intellect  seems  to  be  timely.  It  is  a  matter  of  both  hot  debates  and  quickly  emerging 
results.  Both  aspects  of  the  problem  are  going  to  be  discussed  in  the  papers  presented  at  the 
workshop:  questions  that  are  posed  and  questions  that  are  being  answered.  Mathematical 
definitions  and  models  of  various  aspects  of  intelligence  are  the  most  fascinating  and  exciting 
developments  in  the  history  of  science. 

•  The  question  of  intelligence  of  algorithms  and  neural  networks  is  being  answered  today 
in  at  least  five  ways.  First,  mathematical  models  are  developed  to  explain  psychological 
experimental  data  on  perceptual  and  behavioral  phenomena.  Second,  mathematical  tools  are 
developed  to  simulate  human  perceptual  functions,  such  as  vision  function.  Third, 
mathematical  models  are  developed  for  some  of  the  functions  of  intellect.  Functions  of 
interest  are  associated  with  the  internal  working  or  mind,  such  as  emergence  of  meaning  and 
consciousness.  Fourth,  mathematical  modeling  of  brain  organization  is  used  to  search  the 
understanding  of  mind.  Fifth,  mathematical  and  metaphysical  analysis  of  intelligence  is  being 
undertaken.  This  establishes  connections  among  various  mathematical  concepts,  and  their 
relationships  to  metaphysical  concepts  of  intelligence. 

•  The  following  directions  have  been  discussed  at  the  workshop: 

1.  Prof.  W.  Freeman  is  concerned  with  the  mechanisms  of  brains  by  which  they  construct 
meanings  within  themselves  and  representations  of  meanings  by  motor  actions  in  order  to 
communicate  meanings  to  other  brains:  "A  machine  can  only  'know'  the  inferences  it 
constructs  from  the  sensory  consequences  of  its  own  actions  into  the  world."  This  translates 
into  the  construction  of  machine  intelligence  by  combining  robotics  and  automata  theory,  in 
the  context  of  operations  that  are  modeled  on  brain  dynamics  and  implemented  by  noise- 
stabilized  chaotic  attractors  for  pattern  generation  and  recognition.  Simulations  of  animal  and 
human  performance  in  the  laboratory  are  done  by  constructing  sets  of  differential  equations 
to  represent  cortical  dynamics  in  pattern  recognition  and  solving  them  with  digital 
computers,  with  the  emphasis  on  multisensory  convergence  in  the  combining  of  olfactory, 
visual,  auditory,  and  tactile  inputs.  Future  work  is  now  being  directed  toward  controlling 
chaotic  dynamics  by  regulating  the  sensitivity  to  initial  conditions,  and  also  toward  solving 
the  equations  with  analog  hardware,  taking  advantage  of  the  speed  and  simplicity. 
Thereafter,  the  self-organizing  machines  will  have  increasing  control  of  their  own  input 
through  effectors  by  means  of  which  they  can  control  their  sensors,  and  through  internal 
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feedback  by  which  they  can  regulate  the  attractor  landscapes  in  their  sensory/perceptual 
processors  for  pattern  recognition.  Prof.  W.  Freeman  calls  the  prototype  of  such  devices 
modeled  on  animal  brains  a  "meaning  machine"  that  can  make  representations  of  its  internal 
states  to  its  trainer. 

2.  How  do  categories  and  symbolic  structures  develop?  How  is  it  possible  to  adapt  and 
learn,  while  using  a  priori  knowledge?  What  does  constitute  a  priori  knowledge?  What  are 
the  differences  and  similarities  between  real-time  adaptation  and  long-term  learning  of 
categories  and  complicated  symbolic  structures?  Current  research  directions  in  this 
important  area  are  represented  in  three  talks  in  the  workshop.  Prof.  R.  Sun  introduces  a 
hybrid  model  for  bottom-up  skill  learning  in  a  situated  way.  The  hybrid  model  combines 
connectionist,  symbolic,  and  reinforcement  learning  within  an  integrated  architecture.  In  this 
model,  agents  continuously  learn  from  ongoing  experience.  Both  procedural  skills  and  high- 
level  knowledge  are  acquired  through  interaction  with  the  world.  The  idea  is  that  learning 
can  be  bottom-up:  concepts  and  generic  knowledge  can  be  extracted  on-line  during 
interaction  with  the  world,  from  low-level  knowledge  that  is  acquired  through  exploring  the 
world.  This  idea  is  different  from  traditional  top-learning  learning  theories  in  cognitive 
science,  in  which  generic  knowledge  is  given  and  then  compiled  into  procedural  skills 
through  practice. 

3.  Prof.  L.  Goldfarb  describes  a  fundamentally  new  mathematical  model  for  inductive 
learning—the  evolving  transformation  system.  Although  the  model  unifies  the  classical  vector 
space  and  symbolic  models,  it  suggests  that  the  latter  two  classical  paradigms  are 
fundamentally  inadequate  to  model  inductive  learning  processes,  which  Goldfarb  believes  to 
be  the  central  cognitive  processes.  The  model  also  suggests,  a  simple  formal  mechanism 
responsible  for  the  emergence  of  "fuzziness." 

4.  Dr.  L.  Perlovsky  introduces  modeling  field  theory,  a  model-based  neural  network  whose 
learning  is  based  on  fuzzy  a  priori  models.  The  modeling  field  theory  combines  a  priori 
knowledge  with  adaptation  and  with  structural  learning.  He  also  discusses  interrelationships 
between  various  mathematical  concepts  of  learning  and  their  surprisingly  close  connections 
to  metaphysical  concepts  of  mind  developed  by  philosophers  since  Plato  and  Aristotle. 
Future  research  directions  include  the  development  of  multi-level  hierarchical  models  of 
recognition  in  the  context  of  recognition-behavior  loop. 

5.  Prof.  L.  Levitin  presents  a  dynamic  model  explaining  Zipfs  Law.  Zipfs  Law  is  a 
remarkable  relationship  observed  in  many  complex  systems  of  surprisingly  different  nature, 
including  population  growth  and  linguistics.  For  example,  in  linguistics  the  frequencies  of 
the  word  usage  are  approximately  inversely  proportional  to  their  ranks  in  decreasing 
frequency  order.  Levitin  discusses  an  evolutionary  model  of  emergent  classification  of 
objects  into  classes  corresponding  to  concepts  and  denoted  by  words.  The  model  leads  to  the 
emergence  of  a  hierarchical  two-tier  structure  with  "superclasses"  -  groups  of  classes  with 
almost  equal  populations.  This  model  suggests  that  both,  evolution  and  learning  are  based  on 
a-priori  concepts,  while  creating  new  concepts  (new  species  and  genera)  is  a  relatively  rare 
process.  Future  research  directions  include  determination  of  the  parameters  of  the  model  for 
various  systems,  for  example  of  linguistic  models  of  discourse  in  various  areas  and  analysis 
of  the  relationships  of  model  parameters  and  meanings  of  texts. 

6.  Does  evolution  of  species  resemble  the  process  of  individual  learning?  Is  it  random  or 
directed?  What  is  the  role  of  a  priori  information  coded  in  DNA  in  determining  the  direction 
of  evolution  and  what  mechanisms  this  process  may  employ?  Prof.  M.  Lane  discusses  inter- 
gene  variations  of  the  minimal  energy  required  for  mutations  and  their  roles  in  directing  the 
evolution  process. 

7.  What  is  intelligence?  Is  it  possible  to  act  intelligently  without  being  intelligent?  This 
question  has  been  amplified  in  Searle's  "Chinese  Room"  mental  experiment.  The  nature  of 
intelligence  is  addressed  by  Prof.  L.  Kanal.  He  discusses  which  type  of  intelligence  can  be 
now  simulated  by  machines  and  what  is  truly  "being  intelligent?" 
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6.  THEORETICAL  ISSUES  OF  APPLIED  SEMIOTICS 


•  The  main  theme  of  the  workshop  on  Theoretical  Issues  of  Applied  Semiotics  was  to 
further  advance  the  existing  theoretical  premises  as  far  applying  semiotic  methodology  is 
concerned. 

•  Logical  and  philosophical  component  in  the  theory  of  semiotics  should  incorporate  the 
relevant  results  obtained  in  the  areas  of  mathematical  logic  since  Peirce,  as  well  as  the  views 
of  the  philosophers  of  our  time. 

•  Application  of  semiotics  is  now  focused  upon  a  number  of  large  complex  ("open") 
systems. 

•  The  following  issues  were  discussed  in  detail: 

1.  In  this  paper,  V.  Stefanuk  showed  an  example  of,  how  ideas  of  semiotic  sign  expressed  in 
the  form  of  Frege  triangular  diagram,  including  the  concept  of  similarity  and 
multiresolutional  approach,  may  be  merged  together  to  give  a  complete  solution  to  the 
problem  of  discovery  of  good  representation  in  a  collection  of  problems,  which  are  born  by  a 
famous  AI  puzzle. 

2.  Jan  C.A.  van  der  Lubbe  showed  the  deduction,  induction,  and  abduction  were  essentially 
the  same  logical  procedures  based  on  the  Peircean  notion  of  abduction.  The  difference 
emerges  when  based  upon  one  (or  a  few)  observations,  we  abductively  construct  a  predicted 
value  and  the  rule,  which  justifies  the  inference.  In  a  pattern  recognition  task,  he  proposed  to 
dynamically  compare  automatic  abductive  reasoning  of  the  system  constantly  compare  with 
the  classification  given  by  an  export.  This  will  greatly  increase  classification  power. 

3.  Deborah  Vakas  Duong  described  a  multi-agent  simulation  of  a  sample  market  of 
economics.  The  simulation  showed  an  emergence  of  right  decision  on  agent  "investment" 
(trade  or  production).  It  also  showed  an  emergence  of  a  money-like  property  in  this  society. 
The  author  stressed  that  unlike  general  situation  in  AI,  the  social  type  models  exhibit  the 
emergence  of  new  signs  and  their  change  in  the  process  of  social  activity. 

4.  Leonid  Perlovsky  used  a  thoughtful  survey  of  the  greatest  thinkers  including  Aristotle, 
Goedel,  Turing,  Penrose  to  show  the  genetics  of  complexity  for  a  numbers  of  algorithms  or 
predicates.  Also,  Perlovsky  showed  that  the  exponential  combinatorial  phenomena  may  be 
overcome  by  fuzzy  logic  and  a  multilevel  artificial  brain  organization. 

5.  Mark  F.  Sulcoski,  et.  al.,  showed  how  various  semiotic  techniques  should  be  combined 
together  in  a  practical,  military  oriented  problem.  The  authors  stressed  practical  importance 
of  Q-analyses,  which  allows  representation  of  numerical  relational  data  in  a  geometrical 
space. 

6.  Joseph  Goguen  described  a  mathematical  framework  which  operates  with  signs  in  a 
programming  domain.  He  showed  that  to  describe  the  whole  complexity  of  signs 
construction  and  transformation  of  one  science  system  to  another,  one  needs  a  model  using 
the  theory  of  categories.  The  presentation  gave  the  ground  for  discussion  at  the  end  of  the 
workshop.  Another  topic  of  an  interesting  discussion  in  which  almost  all  people  present  in 
the  workshop  participated  was  understanding  of  the  intrinsic  complexity  of  signs  in 
semiotics,  their  emergence,  and  their  use  in  practical  AI  systems. 

7.  SEMIOTIC  ISSUES  IN  BIOLOGY,  CONSTRUCTED  SYSTEMS,  AND 
SEMANTIC  FOUNDATIONS  OF  MATHEMATICS 

The  following  issues  were  discussed  at  a  group  of  workshops  dedicated  to  biology,  constructed 
Systems  and  semantic  foundations: 

1.        How  do  we  develop  systems  engineering  that  can  create  and  use  'specifications'  for 
its  human  and  artificial  components?  Alternatives  were  proposed: 
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•  Creating  likable  systems 

•  Next  stage  in  notions  of  human-machine  interfaces  and  human  factors  is  required 

2.  The  software  analogy  was  useful  for  allowing  cognitive  science  to  get  beyond  strict 
American  behaviorism.  But  can  we  invent  better  models  for  brain  function  that 
"manipulating  representations"  or  "processing  information?"  We  could  then  invent/learn  new 
mechanisms/analogies  for  constructed  systems. 

3.  How  do  we  integrate,  mesh  different  representations  of  the  'same'  system? 

•  New  ideas  to  understand  the  act  of  'translation' 

•  For  combining  multiple  sources  of  information/decisions 

•  Levels  of  representation 

4.  In  many  discussions,  there  was  a  strange  tone  of  an  incipient  vitalism  combined  with  an 
impoverished  model  of  animal  reasoning  capabilities  (the  worse  of  both  worlds).  One  way  to 
examine  these  prejudices  is  to  examine  the  concept  of  "reducibility".  We  started  with  the 
argument  that  intentionality  is  irreducible. 

5.  Intelligent  living  systems  are  emotional,  social,  and  physically  grounded.  What  role  do  each 
of  these  loosely  described  qualities  play  in  what  we  call  intelligence? 

•  Social,  negotiated  context,  meanings,  and  understanding 

•  Emotional,  caring,  instructional,  attentive  systems 

•  Physically  embodied,  situated  systems. 

How  do  we  represent/model? 

6.  Phantom  limbs,  "Neuralmatrix,"  skill  acquisition,  e.g.  "knowing"  how  much  you  know,  how 
well  you  know  it,  irreversible  quality  of  knowing  (can  'forget'  but  one  doesn't  return  to  the 
same  ignorance)  challenge  present  models  of  what  it  is  to  know  and  how  it  is  represented. 

7.  We  want  to  improve  our  models  and  mathematics  with  improved  semantics.  We  want  to 
create  "shareable"  semantics  for  combined  human  and  automated  systems.  How  do  we  need 
to  alter  current  formal  definitions  of  semantics? 

In  all  topics  above,  mathematical  issues  were  also  addressed.  This  is  the  list  of  mathematical 
issues  relevant  to  the  themes  of  discussion: 

•  Integration  Architecture  in  Constructed  complex  systems 

•  System  Engineering  (Processes  and  Artifacts)&  Validation 

•  Better  Semantic  Models  of  Systems,  Behavior,  and  Time 

•  Computational  Semiotics 

•  Treatment  of  Symbols  by  Computing  Systems  "Linguistic"  Styles  of  Processing 

•  Cooperative  Autonomous  Systems  (not  control) 

•  Models  of  self  and  others  in  context 

•  Mathematical  Methods  for  Self-Reference,  Context  and  Situation 

•  Model  Integration 

•  Mathematical  Proof,  Formality,  Deductive  Processes. 
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8.  SYNERGISM/INTEGRATION 

1 .  The  papers  on  attributes/classification  for  elements  of  fuzzy  sets  correlated  very  well  with  the 
paper  by  Y.  Yufik  who  set  forth  a  classification  of  neuron  sets.  Y.  Yufik  also  noted  some 
phenomena  that  resulted  from  or  affected  the  human  under  stress. 

2.  P.  Kugler's  paper  was  seminal  because  he  identified  a  major  area  of  research  in  semiotics  that 
has  been  obscured  by  the  heavy  attention  to  models,  logic  control  and  computation.  This  is 
because  none  of  the  above  can  be  done  without  measuring  the  world  and  for  open  systems  the 
major  measurement  component  is  the  human. 

3.  S.  Vahie/B.  Ziegler's  and  N.  Farhat/E.  D.  M.  Hernandez'es  papers  were  very  important 
because  they  convey  a  promise  that  we  may  be  able  to  create  neural  laboratories  to  establish  new 
measurement  systems. 

4.  Finally  F.  Brown's  paper  on  logistics  is  promising;  it  demonstrates  a  computing  language 
capable  of  handling  massive  amounts  of  information. 

5.  George  Klir  is  on  the  path  to  integrating  probabilistic,  plausibilistic,  and  possibilistic  into  a 
reasoning  integrated  system. 

6.  D.  Casasent's  paper  contained  excellent  results  in  the  area  of  images  fusion, 

7.  P.  Werbos'  paper  on  neural  nets,  semiotics  and  brain-like  intelligence  presented  an  issue  of  the 
existing  duality:  computational  vs  logic. 

8.  The  triangular  representation  of  triadicity  was  not  used  by  C.  Pierce.   He  used  a  prong 

r  Burch  showed  that  the  prong  was  not  reducible.  The  triangle  A  can  be  broken  into  a 
dyad  <  a  >  and  should  be  avoided  to  prevent  a  wrong  conclusion. 

8.  L.  Domatova's  paper  on  combining  models  presented  an  application  of  information  fusion. 


9.  OPERATORS  MODELING  AS  A  SEMIOTIC  PARADIGM 

The  workshop  has  addressed  the  following  issues: 

1.  Methods  for  designing  on-line  learning  systems  for  acquiring  diagnostic  knowledge  from 
manufacturing  systems. 

2.  Methods  for  estimation  of  unmeasurable  process  variables  by  using  fuzzy  models  of  human 
operators. 

3.  Synthesis  of  supervised  control  strategies  based  on  spatial  aggregation  theory. 

4.  Computational  tools  for  diagnostic  hypothesis  generation. 
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The  conference  ISAS'96  had  a  several  plenary  lectures  and  a  tutorial. 
The  plenary  lectures: 

•  J.  Albus,  A  Roadmap  to  the  Future  of  Intelligent  Systems 

•  P.  Antsaklis,  Hybrid  Control  of  Intelligent  Systems 

•  T.  A.  Sebeok,  Evolution  ofSemiosis  and  the  Origin  of  Languages 

•  L.  Zadeh,  The  Key  Role  of  Information  Granularity  in  Human  Intellige, 

and  the  materials  of  tutorial: 

•  A.  Meystel,  Applied  Semiotics:  Theory,  Methodology,  Toolbox 
will  be  used  for  preparation  of  the  White  Paper. 
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A  Neural  Iconizer  for 
Semiotic  Processors 

H.  John  Caulfield 
Northeast  Photosciences,  Inc. 
4626  Delina  Road 
Cornersville,  TN  37047 
jcaulfield@vallnet  com 

Abstract 

A  PCNN  (Pulse  Coupled  Neural 
Network)  is  shown  to  meet  all  of  the 
necessary  needs  of  a  semiotic  processor  for 
mining  sensor  data  Into  symbols  or  icons. 
At  the  same  time,  the  single  solution  to  the 
most  long-lasting  and  vexing  philosophical 
problems  in  cognitive  science.  These 
analyses  lead  us  to  speculate  that  human 
symbolic  processes  which  we  designed 
computers  to  emulate  (eg.  Boole's  Laws  of 
Thought)  were  almost  certainly  based  on 
an  earlier  human  facility-semiotic 
processing. 

I.  Introduction 

Semiotic  processors  deal  with 
concepts  and  symbols  not  numbers. 
Humans  are  inherently  symbol  makers, 
while  modern,  Turing-equivalent,  digital 
computers  are  inherently  symbol 
manipulators. 

Dick's  age  is  twice  Jane's.  Dick's 
age  (in  years)  plus  Jane's  age  (in  years)  is 
36  years.  To  solve  this  in  a  computer,  we 
say 

symbol      f  Let  x  symbolize  Dick's  age, 
generation  I  Let  y  symbolize  Jane's  age. 
x  =  2y 
x  +  y-36 
Thus  /-"^symbol 

manipulation 

x  =  24^ 
y  =  12, 

Symbol  J  Thus  Dick  is  24  years  old. 

Interpretation  I  Jane  is  12  years  old. 

The  total  system  (symbolization  - 
symbol  manipulation  -  resymbolization) 


is  semiotic.  The  symbol  manipulation  Is 
not 

To  date  most  if  not  all  semiotic 
processors  have  incorporated  humans  to 
symbolize  resymbolize.  Because  that 
incorporation  is  universal,  it  passes 
unnoticed  most  of  the  time. 

If  we  are  to  have  fully-seniiotic 
processors,  we  must  replace 


WORLD 


SENSOR 


SYMBOL 

MANIPULATOR 


HUMAN 
BRAIN 


DISPLAY 


SYMBOL 


HUMAN 
BRAIN 


HY 


WORLD 


SYMBOL 


SENSOR 


SYMBOLIZER 


SYMBOL 
MANIPULATOR 


SYMBOLBZK 


I.  (OPTION  1) 


DISPLAY 


HUMAN 
BRAIN 
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The  full  semiotic  processor  has  no  need  of 
a  human  brain  but  allows  human  optional 
monitoring. 

My  goal  here  Is  to  discuss 
symbolizers  for  semiotic  processors. 
II.  SYMBOLIZER  REQUIREMENTS 

The  symbohzer  converts  sensed 
data  about  the  natural  and  the 
technological  world  and  produces  from 
those  data  symbols  for  further 
manipulation.  It  is  urgent  to  understand 
tius  powerful  functionality  of  the  human 
brain  before  we  try  to  build  it 

Remark  l.  The  number  of  situations 
we  may  encounter  is  finite  but  effectively- 
infinite  in  the  sense  that  look  up  tables  are 
precluded  in  principle.  Typical  2D  spatial 
patterns  contain  of  the  order  of  a  million 
bytes  per  scene.  Suppose  we  have  a  million 
pixels  each  with  8  bits  (128  shades  of  gray) 


dynamic  range.  The  number  of  possible 
scenes  is  then 

N=(12&)10000<*° 

This  meets  any  definition  of  effectively- 
infinite. 

Remark  2.  The  goal  of 
symbolization  is  to  convert  any  of  the  N 
scenes  to  one  of  M(«N)  symbols. 
Symbolization  is  compressive.  It  involves  a 
deliberate  and  irreversible  loss  of 
information  to  facilitate  manipulation.  For 
instance,  M  may  be  small  enough  to  permit 
look  up  tables.  As  a  human  example, 
consider  all  of  the  scenes  we  might  label 
with  the  word  "dog."  That  set  of  scenes  is 
itself  effectively  infinite,  yet  very  young, 
small  humans  do  it  routinely  and  ineranth/. 

Remark  3.  There  are  no  God-given 
categories  in  the  world.  Categories, 
up  till  now,  have  been  human  semiotic 
creations.  The  world  and  the  images  on 
our  retinas  simply  are.  Humans  group 
them.  We  can  make  "self-organizing*' 
computers  to  group  items  according  to 
human-selected  criteria.  But  if  we  selected 
different  criteria,  the  groups  would  be 
different  Consider  the  figures  below. 

We  may  decide  to  put  them  in  two  groups 
(a  human  decision).  Does  it  make  sense  to 
ask  how  many  groups  are  "really  there"? 
Two  obvious  groupings  are  triangles/circles 
and  MgfsmalL  The  latter  categories  are 
valid  but  fuzzy.  Is  there  a  radius 

F    such  that  circles  with  r>r  are 

big  and  circles  with  T<T^   are  small? 

Who  chooses   T  ^  ?  But  mere  are  many 
other  valid  groupings,  for  example,  figures 
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with  centroids  above  and  below  the 
centroid  of  the  whole.  All  of  these  are 
"right"  answers  in  the  sense  mat  they 
reduce  N=12  figures  to  M=2  groups.  None 
is  more  right  than  the  other. 

Remark  4.  Utility  is  one  way  to 
select  which  symbolization  or 
categorization  to  choose.  At  least  for  the 
immediate  future,  artificial  semiotic 
processors  will  be  special  purpose 
machines.  Groupings  reflecting  our  a 
priori  knowledge  of  the  data  and  problem 
domains  will  be  preferred. 

Remark  S.  Symbols  must 
emphasize  the  relevant  and  ignore  the 
irrelevant  -  once  we  decide  what  is  and  is 
not  relevant  If  we  want  to  classify  the 
figures  above  as  big  or  small,  we  want  to 
ignore  their  shapes. 

Remark  &  Combining  Remarks  4 
and  5,  we  seek  "robust"  symbolization  in 
some  problem-dependent  sense. 
Ill  Philosophy  of  Symbolization 

Semiotics  can  not  avoid  philosophy. 
As  seen  from  Sec.  I,  semiotic  computers 
invade  a  domain  previously  forbidden  to 
"mere"  symbol  -manipulating  computers. 
Just  analyzing  the  meaning  and 
implications  of  the  previous  sentence  is  a 
major  undertaking. 

Consider,  for  example,  the  strong 
version  of  Church's  thesis  (almost 
universally  believed  and  neither  proved  nor 
disproved).  Roughly,  it  says  this:  "Any 
physical  process  can  be  modeled  to 
arbitrary  accuracy  with  any  digital 
computer  having  sufficient  memory."  But 
all  digital  computers  are  just  symbol 
manipulators.  Given  an  input  (data  and 
instructions  which  are  human  concepts  not 
distinguishable  by  the  computer),  there  is  a 
unique  output  Every  computer  is 
equivalent  to  a  giant  look  up  table. 
Symbols  (to  the  computer)  do  not  "mean" 
anything.  That  is  the  symbols  do  not  refer 
to  anything  other  than  themselves.  1 
means  1.  0  means  0.  Nothing  in  a 
computer  means  anything  or  is  a  symbol 


for  anything.  Yet  we  humans  mean  things 
by  our  symbols  and  yet  our  brains  are 
computers  and  subject  to  computer 
manipulation.  Where  does  meaning  enter 
the  world? 

In  "cognitive  science,"  the  question 
of  symbohzation  is  itself  symbolized  by 
Searle's  story  of  "The  Chinese  Room  "  To 
most  people  outside  that  field,  this  story 
seems  to  add  nothing  to  their 
understanding.  The  ultimate  problem  is 
mis:  "If  computers  mean  nothing  and  if  die 
human  brain  is  a  computer,  then  the  human 
brain  means  nothing.  But  humans  do  mean 
things.  So  how  can  these  contradictions  be 
accounted  for?**  Searle  and  others  seem  to 
feel  mat  a  true  semiotic  computer  (one  that 
symbolizes  things  and  associates  symbols  in 
such  a  way  that  each  "means"  the  other) 
cannot  be  based  on  current  digital 
computers.  We  humans  are  somehow  more 
man  "just  computers."  My  reply  is 
computers  can  be  far  more  man  Searle 
envisions  "mere  computers"  to  be.  Enter 
the  semiotic  computer. 
IV.  A  Spatial  Pattern  Symbolizer 

If  the  useful  grouping  Is  spatial 
patterns  (e.g.  triangles  and  circles)  or  can 
be  represented  that  way  (e.g.  a  wavelet 
transform  of  a  spoken  word  displayed  as  a 
2D  pattern),  then  "robustness"  may  be 
taken  as  the  ability  to  ignore  mild 
distortions,  afffne  transformations,  and  the 
like. 

If,  in  addition,  we  do  not  want  to 
train  the  symbolizer,  because  we  do  not 
know  what  patterns  it  may  encounter,  then 
we  want  a  symbolizer  which  is  contractive, 
(M«N)  for  all  possible  input  patterns. 

We  have  experimented  with  PCNNs 
(Pulse  Coupled  Neural  Networks)  as 
symbolizers.  We  find  that  they  are 
contractive  for  all  inputs  and  extremely 
robust 

Invented  by  Eckhorn  in  Germany  to 
emulate  the  observed  behavior  of  the 
neurons  in  the  visual  cortex  of  cats,  PCNN 
efforts  in  America  have  been  led  by  J.  - 


Johnson  at  the  U.S.  Army  Missile 
Command.  A  special  issue  of  IEEE 
Transactions  on  Neural  Networks  edited 
by  Dr.  Johnson  will  be  published  in  1998. 
For  now,  however,  the  best  way  to  learn 
more  and  to  use  PCNNs  is  to  access  the 
web  page  of  the  author's  previous 
employer  (www.caos.aamu).  The  code  is 
down  loadable  or  usable  there. 

The  bask  idea  is  that  each  input 
pixel  drives  its  own  neuron.  Hie  neuron  is 
a  classic  integrate-fire-reset  oscillator,  so 
neurons  driven  by  brighter  pixels  pulse 
more  frequently.  Then  we  perturb  the 
situation  by  allowing  each  neuron  to 
influence  its  neighbors.  Regions  of  more- 
or-Iess  equal  brightness  begin  to 
synchronize  -  beat  together.  What  results 
is  a  complex  3D  pattern  (2D  in  space,  ID  in 
time).  As  we  seek  information 
compression,  this  seems  to  carry  us  in  the 
wrong  direction. 

Suppose,  however,  we  integrate  that 
3D  signal  over  2D  space  leaving  only  a  ID 
time  signal.  Experimentally,  that  time 
signal  has  a  periodic  strange  attractor  -  a 
pattern  which  more-or-less  repeats 
indefinitely  but  never  exactly  repeats.  We 
call  a  single  period  of  that  time  signal  the 
"icon**  of  the  input  pattern.  Million  bit 
patterns  lead  to  very  simple,  e.g.  10- 
position  icons.  You  can  see  and  create 
icons  at  the  previously-noted  URL. 

Another  experimental  property  of 
the  icons  is  that  they  are  quite  robust 
against  mild  distortions  and  afnne 
transformations  nearly  always.  Yet  there 
are  sharp  boundaries  between  attractor 
basin.  Consider  our  running  example  of 
triangles  and  squares.  We  can  distort  each 
into  the  other,  e.g. 

^    0>     O  •  • 

There  is  a  very  sharp  dividing  line  between 
the  attractor  basin  to  which  the  "perfect" 
triangle  belongs  and  the  attractor  basin  to 


which  the  "perfect77  circle  belongs. 

Note  that  the  PCNN  has  attractors 
for  triangles  and  circles  even  though  it  was 
not  trained  to  recognize  them.  If  we  want 
to  recognize  triangles  and  circles,  all  we 
have  to  do  is  recognize  (name)  the 
attractors. 

In  the  Jewish/Christian/Muslim 
story  of  Adam  (man)  and  Eve  (woman), 
God's  first  assignment  to  mem  was  to 
name  the  animals.  In  PCNN  terms,  this 
may  say  much  about  the  origin  of  language 
-  a  deep,  unsolved  problem  which  bears 
strongly  on  the  uniqueness  of  humans. 
Suppose  our  DNA  directs  the  formation  of 
sight  and  sound  PCNNs.  We  may  come  to 
recognize  the  sight  of  a  dog  and  the  sound 
of  the  word  "dog  "  If  these  two  are  linked 
by  Hebbian  association,  I  am  willing  to 
assert  that  the  sight  comes  to  mean  the 
sound  and  conversely. 

It  is  true  that  a  computer 
contemplating  its  own  digits  means 
nothing.  But  a  special  computer  -  a  set  of 
PCNNs  -  interacting  with  the  world  can 
come  to  mean  things.  The  system  -  not  just 
its  CPU  -  and  the  world  come  to  "ground" 
meanings  in  reality.  Thus  yet  another 
fundamental  problem  can  be  solved  in 
principle.  The  PCNN  may  be  the  missing 
link  between  the  subsymbohc  neural 
networks  which  we  know  are  the  basis  for 
all  brain  functions  and  the  symbolic 
processes  you  are  using  at  this  moment 
The  subsymbolic  PCNN  in  interaction  with 
the  world  produces  a  "starting  set7*  of 
symbols  on  the  basis  of  which  symbolic 
processing  arises. 


iconizer)  for  a  semiotic  processor. 

(3)  PCNNs  offer  a  sing] 
plausible  answer  to  many  deep,  long- 
standing philosophical  questions. 


V.  Conclusions 

I  hope  mat  I  have  convinced  you 
that  the  following  propositions  are  true, 

(1)  For  events  symbolizable 
in  terms  of  spatial  relationships,  a  PCNN  is 
an  iconizer  which  exhibits  all  of  the  a  priori 
desirable  features  of  an  iconizer. 

(2)  Therefore,  the  PCNN 
offers  a  useful  front  end  (symbolizer,  r 


Information  Needs  And  Its  Impact  On  Medicine 

Thomas  Hankins,  CNM,  CP. 
Douglas  Frank,  B.S.,  C.C.P. 
Thomas  Williams,  M.D.,  Ph.D. 

Until  recently  with  the  exception  of  the  area  of  research,  the 
only  information  that  was  pertinent  to  hospitals  and  practicioners  was 
"How  much  do  we  bill  and  how  much  do  we  get  paid".  Now  the 
medical  community  is  quickly  finding  itself  faced  with  a  new  threat  of 
a  double  edged  sword  which  many  are  not  prepared  to  take  on.  That 
threat  is  Managed  Care.  Managed  Care  has  and  is  turning  the  practice 
of  medicine  inside  out.  How  does  informational  needs  find  itself  smack 
in  the  middle  of  this  battle?  Every  good  general  knows  that  the  more 
you  know  about  your  enemy  the  better  off  you  are  and  you  never  let 
your  enemy  know  more  about  yourself  than  you  know.  That's  where 
the  informational  needs  fit  in. 

For  years  practicioners  were  always  busy  taking  care  of  their 
patients  the  best  way  they  could  and  usually  having  little  more 
information  on  the  patients  than  the  patients  medical  record.  While 
on  the  other  hand  the  insurance  carriers  were  busy  keeping  actuarial 
tables  and  different  bits  of  information  on  large  populations  of  patients. 
The  insurance  companies  learned  early  on  that  'Those  who  control 
the  information.... control  the  game". 

This  paper  will  explore  the  tremendous  impact  information 
needs  is  having  not  only  on  the  practice  of  medicine  but  its  survival  as 
well. 
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